dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
649 stars 124 forks source link

WebToEpub vs FFnet access denied 403 #1306

Closed LennTea closed 3 weeks ago

LennTea commented 2 months ago

Describe the bug Hi! I've been using WebToEpub Chrome to download/cache Fanfiction.net pages so I could use Fanficfare to bring the story to Calibre. So far I haven't had trouble (I just added a "no save as popup" and "throttle per chapter at 10s/ch", to protect WtE from Cloudflare). But today I just keep getting an error popup during the download of the 1st chapter, no matter which fic I use: "WARNING: site 'www.fanfiction.net' has sent an Access Denied (403) error. You may need to logon or satisfy a CAPTCHA before WebToEpub can continue".

I've logged out/back in in FFnet, cleared my Chrome cache, tried with another fic, ...

Is it only me/a problem with my computer or IP, or is it widespread?

To Reproduce Steps to reproduce the behavior:

  1. Go to a fic page on FFNet
  2. Click on the wte addon
  3. launch "pack epub"
  4. See error mentioned above

Expected behavior Should proceed to slowly download all chapters and eventually pack the epub

Screenshots 2024-05-09 16_07_30- 2024-05-09 16_26_53-WebToEpub

Desktop (please complete the following information):

Additional context none

gamebeaker commented 2 months ago

I get the same error in chrome. In firefox i can still download from fanfiction.

LennTea commented 2 months ago

For some reason, I have trouble using the Firefox WebToEpub & Fanficfare Calibre combo. I always get an error message from FFF when I use WTE to cache from a Firefox browser (even with Developer Firefox and one single addon which should remove any possible conflict in the profile). Which is why I'd been using Chrome & WebToEpub

EDIT: mentioned the cache vs Firefox problem to Jimmy who made FFF and apparently it wasn't me, it was either a WTE or Firefox update conflicting with FFF, and he just published an update to FFF on the mobileread forum, so the WTE on FFnet+FFF on Calibre combo works again.

Mavsynchroid commented 2 months ago

I'm also having this problem. Before today, sometimes I would have to manually refresh a tab with the story I wanted from fanfiction so I could manually click the captcha and make the chapter scraping go faster, but ever since today, it simply won't won't work. Just gives me that access denied captcha error, when it definitely does NOT need a captcha. Please help!

Kiradien commented 2 months ago

Looks like some annoying referrer permission changes have been put through to further block cross-origin calls. Calling a simple "fetch" while on fanfiction.net works fine, but if you try the same call from the webtoepub window it's an immediate 403.

Looking into the requests, the biggest differences I see in the successful request are:

Sec-Ch-Ua-Arch:
"x86"
Sec-Ch-Ua-Bitness:
"64"
Sec-Ch-Ua-Full-Version:
"124.0.6367.119"
Sec-Ch-Ua-Full-Version-List:
"Chromium";v="124.0.6367.119", "Google Chrome";v="124.0.6367.119", "Not-A.Brand";v="99.0.0.0"

Sec-Ch-Ua-Model:
""

Sec-Ch-Ua-Platform-Version:
"10.0.0"

Regardless of all the extras, I'm pretty sure this is related to FF.net's cloudflare settings... I am playing around with a few workarounds but no luck inside the current project so far.

dteviot commented 2 months ago

@Kiradien

Calling a simple "fetch" while on fanfiction.net works fine,

I'm thinking the sledgehammer approach.

  1. Have WebToEpub open the chapter as a new tab in the browser.
  2. Inject a content script into the tab to get the content, and pass back to WebToEpub.

Note that WebToEpub already injects a content script into the table of contents page of a story. And I think there's some code elsewhere that can open a new tab.

Or maybe update the existing content script, to allow it to be called to do the fetch. I've got another project that does 2 way communication between a content script and the main extension. https://github.com/dteviot/SyosetuGoogle

Dongboy69 commented 1 month ago

https://www.lightnovelpub.com same problem

greenskye commented 1 month ago

Hoping this gets fixed soon as I know of no other method to download stories from fanfiction.net. Fanficfare already stopped supporting them ages ago. The issue appears to impact every other online story to epub converter I've found as well. Firefox works, but I'm assuming it's just a matter of time before it quits too.

gamebeaker commented 1 month ago

I think the problem is, that the cloudflare cookie is not sent with the request. If you open Inspect -> Network click the failed request -> cookies there is this message: Screenshot 2024-05-21 230645 Link from message: https://developers.google.com/privacy-sandbox/3pcd/chips?utm_source=devtools For fanfiction.net the missing cookies name is cf_clearance: grafik WebToEpub in firefox sends this cookie: grafik edit: @Kiradien i missed your comment sry.

gamebeaker commented 1 month ago

@dteviot here is a code that works on second try. Why on second try? I forgot how callbacks and promises work. Change manifest.json -> permissions add "cookies" add in HttpClient.js -> wrapFetch() or wrapFetchImpl() (optimization: this code only has to run once to set the cookies not on each fetch)

//check if the browser is chrome
if(!util.isFirefox()){
    //to get partitionKey in the form of https://<site name>.<tld>
    chrome.cookies.getAll({
        url: url,
    })
    .then(function(cookie) {
        //get all cookie from the site which use the partitionKey (cloudflare)
        chrome.cookies.getAll({
            partitionKey: {topLevelSite: "https://"+cookie[0].domain.substring(1)},
        })
        .then(function(cookies) {
            //create new cookies for the site without the partitionKey
            //cookies without the partitionKey get send with fetch
            cookies.forEach(element => {
                chrome.cookies.set({
                    domain: element.domain,
                    url: "https://"+element.domain.substring(1),
                    name: element.name, 
                    value: element.value
                });
            });
        });
    });
}
dteviot commented 1 month ago

@gamebeaker

Thank you. That seems to work. Also might explain why Firefox doesn't have the problem, it handles Cookie PartitionKeys differently. FYI. Have re-written your code to use await.

@Dongboy69 @greenskye @LennTea @Mavsynchroid @Kiradien

Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:

For my notes: 120 minutes work (Although I estimate half of it was because I forgot after changing manifest, you need to re-load the extension.)

greenskye commented 1 month ago

@gamebeaker

Thank you. That seems to work. Also might explain why Firefox doesn't have the problem, it handles Cookie PartitionKeys differently. FYI. Have re-written your code to use await.

@Dongboy69 @greenskye @LennTea @Mavsynchroid @Kiradien

Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:

For my notes: 120 minutes work (Although I estimate half of it was because I forgot after changing manifest, you need to re-load the extension.)

Tested the chrome extension and it seems to work now

dteviot commented 1 month ago

@gamebeaker A new problem with the partitonKeys. Site https://mtlarchive.com/, uses Clouldflare, but does not use a cookie for the site itself. This makes this function https://github.com/dteviot/WebToEpub/blob/01c6f23061f084d1a6f2dbf505647a99ce5869eb/plugin/js/HttpClient.js#L200-L221

Exit at line 205, and we never load the Cloudflare cookie into the session. I can work around it with code like this:

            // get partitionKey in the form of https://<site name>.<tld> 
            let cookie = await chrome.cookies.getAll({url: url});
            let parsedUrl = new URL(url);
            let topLevelSite = (cookie.length == 0)
                ? parsedUrl.protocol + "//" + parsedUrl.hostname
                : "https://"+cookie[0].domain.substring(1);

            //  get all cookie from the site which use the partitionKey (e.g. cloudflare)
            let cookies = await chrome.cookies.getAll({partitionKey: {topLevelSite: topLevelSite}});

But then I wonder if the getAll with a url parameter is even needed. Could we just create the topLevelSite directly from the URL? e.g.

            let parsedUrl = new URL(url);
            let topLevelSite = parsedUrl.protocol + "//" + parsedUrl.hostname;

            //  get all cookie from the site which use the partitionKey (e.g. cloudflare)
            let cookies = await chrome.cookies.getAll({partitionKey: {topLevelSite: topLevelSite}});

Or was there some problem you were trying to work around with that call?

gamebeaker commented 1 month ago

@dteviot

But then I wonder if the getAll with a url parameter is even needed.

No.

Could we just create the topLevelSite directly from the URL? e.g.

Yes, i didn't know that you can parse a URL like that i only thought of regex and before i try regex i found getAll simpler.

dteviot commented 3 weeks ago

@gamebeaker @greenskye @Mavsynchroid @Dongboy69 @LennTea

Updated version (0.0.0.160) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours to 21 days.