JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
746 stars 158 forks source link

scribblehub flaresolverr fix #900

Closed mavi0 closed 1 year ago

mavi0 commented 1 year ago

Hi JimmXinu,

This is a 'fix' for the scribblehub adaptor which currently doesn't work at all. At the moment, by default it returns a 403 for the base url of a fic which genuinly confuses me and I can't find a reason for it - I can't replicate it with curl, and the closest I can get is a cloudlfare page if I don't set the useragent.

I figured it might be a cookie thing as Scribblehub now have a GDPR prompt, so I tried the flaresolverr plugin. This partially worked, but the part of the script which runs an ajax request for the table of contents returns the correct content but with a 400 code. It's wierd because I can curl that fine, and in the web browser I can do the request successfully even after deleting all the tracking/GDPR cookies and get a 200 back.

For example:

curl 'https://www.scribblehub.com/wp-admin/admin-ajax.php' -X POST -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:107.0) Gecko/20100101 Firefox/107.0' -H 'Cookie: toc_show=1; toc_sorder=asc' --data-raw 'action=wi_gettocchp&strSID=421879&strmypostid=0&strFic=yes' -H "Content-Type: application/x-www-form-urlencoded"

As a side note, scribblehub do appear to have changed the format of the query to something like this:

contents_payload = {"action": "wi_getreleases_pagination", "pagenum": 1, "mypostid": 421879}

Instead of:

contents_payload = {"action": "wi_gettocchp", "strSID": self.story.getMetadata('storyId'), "strmypostid": 0, "strFic": "yes"}

However, the previous query payload still works (and works just fine with curl and in the dom explorer in firefox) and the response can be parsed just fine by the rest of the program as is, and both queries return the same 400 code.

It does feel like a bodge, I have gone through requestable.py, fetcher.py and flaresolverr_proxy.py and as far as I can tell the request is formed correctly. From what I can tell this "fix" will have no impact on using flaresolverr for fanfiction.net, but just allows a 400 to be accepted just like 200 is - it's just super annoying I can't seem replicate getting a 400 code with curl or firefox.

Tested just on the cli on macos.

JimmXinu commented 1 year ago

I'm not seeing any issues downloading from scribblehub using FFF as normal without proxy. That suggests to me that you (or your IP) have been flagged for extra screening.

As for the changed AJAX input, I'm not hugely interested in changing it myself if it still works. But I will review and (probably) accept a PR that changes it, as long as it works.

As for the proxy status code issue, I hesitate very much to accept code 400 as the same as 200 across the board. Have you looked at what adapter_trekfanfictionnet.py does with a similar situation? That's a much more focused solution.

mavi0 commented 1 year ago

As for the proxy status code issue, I hesitate very much to accept code 400 as the same as 200 across the board. Have you looked at what adapter_trekfanfictionnet.py does with a similar situation? That's a much more focused solution.

Oh nice, I hadn't. I'll sort something like that and edit this PR when it's done. Thanks for the pointer!

JimmXinu commented 1 year ago

I've posted a CLI test version, since that's what you presumably use.

mavi0 commented 1 year ago

Thanks! I use the calibre plugin normally, just the cli is easier to test with. Took forever, even with gigabit internet, and I haven't updated my library in a month or so, but I tested it in calibre on the hundred or so scribblehub (and other sites) fics in my reading list and it worked fine - just an fyi.

I know it's a bit of a niche thing this pr, but you mentioned the adaptor was working for you without flaresolverr - I just wanted to check with you what your user agent is set to, or if you can think of anything else in your personal.ini file which might help? I'm assuming you're in the US and from my tests, changing my IP to a non-EU place does seem to prevent the GDPR pop-up on scribblehub in the browser, but I still get a 403 - plus I've tried the cli on a bunch of servers I have access to (although they are all in countries with GDPR) and the same happens is all.

calibre fff summary log ``` Update epub completed, added 1 chapters for 21 total. Update epub completed, added 1 chapters for 27 total. Update epub completed, added 3 chapters for 12 total. Update epub completed, added 1 chapters for 14 total. Update epub completed, added 11 chapters for 77 total. Update epub completed, added 6 chapters for 20 total. Update epub completed, added 3 chapters for 15 total. Update epub completed, added 5 chapters for 36 total. Update epub completed, added 4 chapters for 49 total. Update epub completed, added 12 chapters for 154 total. Update epub completed, added 1 chapters for 111 total. Update epub completed, added 2 chapters for 9 total. Update epub completed, added 5 chapters for 20 total. Update epub completed, added 2 chapters for 13 total. Update epub completed, added 3 chapters for 12 total. Update epub completed, added 1 chapters for 26 total. Update epub completed, added 3 chapters for 41 total. Update epub completed, added 5 chapters for 46 total. Update epub completed, added 1 chapters for 48 total. Update epub completed, added 3 chapters for 17 total. Update epub completed, added 3 chapters for 29 total. Update epub completed, added 1 chapters for 6 total. Update epub completed, added 11 chapters for 101 total. Update epub completed, added 1 chapters for 7 total. Update epub completed, added 6 chapters for 120 total. Update epub completed, added 1 chapters for 27 total. Already contains 22 chapters. Already contains 4 chapters. Already contains 7 chapters. Already contains 14 chapters. Already contains 16 chapters. Already contains 17 chapters. Already contains 7 chapters. Already contains 44 chapters. Already contains 11 chapters. Already contains 38 chapters. Already contains 54 chapters. Already contains 3 chapters. Already contains 23 chapters. Already contains 3 chapters. Already contains 1 chapters. Already contains 1 chapters. Already contains 2 chapters. Already contains 32 chapters. Already contains 24 chapters. Already contains 4 chapters. Already contains 23 chapters. Already contains 31 chapters. Already contains 16 chapters. Already contains 3 chapters. Already contains 66 chapters. Already contains 3 chapters. Already contains 11 chapters. Already contains 23 chapters. Already contains 46 chapters. Already contains 17 chapters. Already contains 15 chapters. Already contains 174 chapters. Already contains 2 chapters. Already contains 16 chapters. Already contains 6 chapters. Already contains 12 chapters. Already contains 1 chapters. Already contains 33 chapters. Already contains 88 chapters. Already contains 17 chapters. Already contains 1 chapters. Already contains 1 chapters. Already contains 5 chapters. Already contains 3 chapters. Already contains 2 chapters. Already contains 35 chapters. Already contains 10 chapters. Already contains 15 chapters. Already contains 31 chapters. Already contains 34 chapters. Already contains 8 chapters. Already contains 5 chapters. Already contains 7 chapters. Already contains 7 chapters. Already contains 13 chapters. Already contains 5 chapters. Already contains 14 chapters. Already contains 66 chapters. Already contains 72 chapters. Already contains 6 chapters. Already contains 31 chapters. Already contains 43 chapters. Already contains 16 chapters. Already contains 8 chapters. Already contains 16 chapters. Already contains 9 chapters. Already contains 26 chapters. Already contains 9 chapters. Already contains 31 chapters. Already contains 19 chapters. Already contains 17 chapters. ```
JimmXinu commented 1 year ago

Yes, I in the US. If you updated that many stories at once, that's when I would expect you to get blocked by the site...

mavi0 commented 1 year ago

Oh, don't worry I was doing it one at a the cli, plus I'm still hammering them through the flaresolverr proxy. Only thing which seems to trip it up (with curl anyway) is if the useragent is left as default and Cloudlfare steps in - but 'User-Agent: Mozilla/5.0' is enough to get past that.