dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
654 stars 124 forks source link

403 error for 18.foxaholic.com and foxaholic.com #1319

Closed MarkoPabst closed 4 weeks ago

MarkoPabst commented 2 months ago

Describe the bug Hi! I've been using WebToEpub Chrome to download https://18.foxaholic.com/novel/the-sugar-has-escalated-into-nocturne-in-trinia/ chapters as epub So far I haven't had trouble. But today I just keep getting an error popup during the download of the 1st chapter, no matter which chapter I tried: "WARNING: Site '18.foxaholic.com' has sent an Access Denied (403) error. You may need to logon or satisy a CAPTCHA before WebToEpub can continue."

I've logged out/back in https://18.foxaholic.com/, cleared my Chrome cache, tried with another fic, ... There were not any images present in this chapter and no patreon restrictions either just free content.

To Reproduce Steps to reproduce the behavior:

Go to https://18.foxaholic.com/novel/the-sugar-has-escalated-into-nocturne-in-trinia/ Click on the wte addon Select any chapter or all launch "pack epub" See the error mentioned above Expected behavior Should proceed to slowly download all chapters but get stuck on about 10% and then show this.

Screenshots Screenshot (12)

I'm using Windows 10 Chrome Version 124.0.6367.208 (Official Build) (64-bit)

dteviot commented 1 month ago

@MarkoPabst

I'm seeing more and more of this. Site is using Cloudflare for anti-scraping. Which also blocks WebToEpub.

norabelle101 commented 1 month ago

@dteviot

Hello! I've also had this same issue with both foxaholic sources recently since it worked up until a couple weeks ago. The security of the site seems to be the same though since they've always had Cloudflare security and the "human verification" check (or maybe something in the backend has changed?). But, this issue doesn't seem to actually implicate the Webtoepub extension itself as it moreso might be a problem with the Chrome browser since the store version on Mozilla Firefox can easily bypass the Cloudflare protection for this site and download any novel, just like how Chrome was able to before. So, I'm rather confused on what exactly is causing the "404 CAPTCHA error" in only Chrome...

Also, in one of the comments for issue #1304, the user reported using a "noscript" extension from Chrome webstore to bypass this error, but it didn't quite work for me, despite putting the same configurations... Is there really no way to find a workaround to this issue to allow it to rework again in Chrome since this browser is preferable?

And after using the extension for the first time in Mozilla, the interface and overall look (style + fonts) was vastly different from the clean design of Chrome, so it was a bit more hard to see where everything was :'(

EDIT: I tested the Chrome version of the extension on other browsers as well (Brave and Microsoft Edge), and received the same 404 error. It seems the extension does not work at all on either foxaholic site if using the Chrome version of the exension, but the Mozilla version is fine?

MarkoPabst commented 1 month ago

@dteviot

Hello! I've also had this same issue with both foxaholic sources recently since it worked up until a couple weeks ago. The security of the site seems to be the same though since they've always had Cloudflare security and the "human verification" check (or maybe something in the backend has changed?). But, this issue doesn't seem to actually implicate the Webtoepub extension itself as it moreso might be a problem with the Chrome browser since the store version on Mozilla Firefox can easily bypass the Cloudflare protection for this site and download any novel, just like how Chrome was able to before. So, I'm rather confused on what exactly is causing the "404 CAPTCHA error" in only Chrome...

Also, in one of the comments for issue #1304, the user reported using a "noscript" extension from Chrome webstore to bypass this error, but it didn't quite work for me, despite putting the same configurations... Is there really no way to find a workaround to this issue to allow it to rework again in Chrome since this browser is preferable?

And after using the extension for the first time in Mozilla, the interface and overall look (style + fonts) was vastly different from the clean design of Chrome, so it was a bit more hard to see where everything was :'(

EDIT: I tested the Chrome version of the extension on other browsers as well (Brave and Microsoft Edge), and received the same 404 error. It seems the extension does not work at all on either foxaholic site if using the Chrome version of the exension, but the Mozilla version is fine?

It didn't work for me either.

Dongboy69 commented 1 month ago

can confirm Firefox don't get this error only chromium browsers

dteviot commented 1 month ago

@norabelle101

Chrome version of the extension on other browsers as well (Brave and Microsoft Edge)

FYI. Chrome, Brave and Edge (and most other) all use the same base (chromium) engine. As far as I know, the only two major browsers that don't use Chromium are Firefox and Safari.

I would suspect that whatever Cloudflare has done, it ignores Firefox. Note, I SUSPECT the problem is Cloudflare, as the Firefox and Chrome versions of the extension are (I'd guestimate) around 98% identical code. I don't remember any differences in the Fetch Chapter logic.

Is there really no way to find a workaround to this issue to allow it to rework again in Chrome since this browser is preferable?

I have a thought how this could be done. Basically, instead of trying to fetch just the wanted content, open each chapter in a tab so whatever Cloudflare is looking for WILL be there. Then fetch the content from the tab. The problem with this plan is:

  1. It's likely to be a lot of work. I'm thinking 10s of hours of work. And I just don't have the motivation (or time)
  2. Doing this is complicated, and I'm not currently sure how to do at least one part of it.
  3. I don't think it can get images.
  4. WebToEpub will be kind of annoying to use. Will be opening and closing tabs. I'm not sure it will even work on Android.

An alternate plan would be to do something similar with an actual Browser. Selenium allows another program to "remote control" a browser. The problem(s) with this plan is

  1. It basically requires writing a whole new program from scratch. So, probably 100s of hours of work.
  2. It's not an extension, but a stand alone program, So, would only work for Windows.
  3. Getting images is not easy.

Note, I've seen a scraper project that is going down this path https://github.com/martial-god/Benny-Scraper. Giving some consideration to assisting.

norabelle101 commented 1 month ago

@dteviot

FYI. Chrome, Brave and Edge (and most other) all use the same base (chromium) engine. As far as I know, the only two major browsers that don't use Chromium are Firefox and Safari.

I would suspect that whatever Cloudflare has done, it ignores Firefox.

Oh, I see! As long as Firefox remains capable bypassing this protection, that's good then!

I have a thought how this could be done. Basically, instead of trying to fetch just the wanted content, open each chapter in a tab so whatever Cloudflare is looking for WILL be there. Then fetch the content from the tab.

As you've explained, it's probably most preferable to not go further with implementing this, if it's only going to overly complicate the simply purposed usage of Webtoepub :)

Note, I've seen a scraper project that is going down this path https://github.com/martial-god/Benny-Scraper. Giving some consideration to assisting.

I've never heard of this program, but that would really amazing if they could enlist your assistance, albeit this program only supports around five sources, so I'm not sure if they'd be willing to add more that would come close to the level of Webtoepub, though it's nice to have a working alternative :)

All in all, thank you very much for considering these methods to get around this issue! For now, it's probably best to stick to using Firefox for any sources which employ this protection :D

dteviot commented 1 month ago

@norabelle101 @MarkoPabst @Dongboy69

Seems to be same problem as https://github.com/dteviot/WebToEpub/issues/1306. Fix provided by gamebeaker.

Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:

For my notes: no extra work. (Fixed by #1306)

dteviot commented 4 weeks ago

@norabelle101 @MarkoPabst

Updated version (0.0.0.160) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours to 21 days.