JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
739 stars 156 forks source link

Fanfiction.net stories not downloading #622

Closed Katylar closed 3 years ago

Katylar commented 3 years ago

In the logs, I get the error: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

Twilight666 commented 3 years ago

I have a similar issue:

AttributeError: 'NoneType' object has no attribute 'group'

During handling of the above exception, another exception occurred:

cloudscraper.exceptions.CloudflareIUAMError: Cloudflare IUAM possibility malformed, issue extracing delay value.

roon0 commented 3 years ago

Hi I have the same issue but the report was closed. I ran calibre in debug mode and this is the result.

(large and off topic debug log removed--JimmXinu)

atroly commented 3 years ago

I am also experiencing this problem. A single story updated successfully, but all subsequent attempts are failing with the above error. When I then accessed fanfiction.net via my browser, Cloudflare generated a "checking your browser" message before continuing to the website.

themaster567 commented 3 years ago

I am also getting this error. I'm not sure how to read this as anything other than saying that there is now a paid version of FanFicFare that you must buy in order to properly use FanFiction.net. Please, prove me wrong, because I don't see how else you're supposed to interpret that message.

EDIT: Explained in the comments below, I am indeed wrong, which is a good thing.

Twilight666 commented 3 years ago

You are wrong.

A few weeks ago FFnet stopped working, and the solution found was to add another plugin that bypassed Cloudflare Since then, FFnet was working properly for FanFicFare

What this seems to mean is that either FFnet upgraded it's protection even more to the point that only the paid version of the Cloudflare plugin would work now, or that the solution worked for the 2 weeks the new plugin was added and it now needs to be updated to the paid version, or that the Cloudflare plugin did something and made its free version no longer working to force people to buy the full version

In any case there is no paid version of FanFicFare!!

Edit: Also I upgraded cloudscraper to the latest version and now the error is:

cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

der-hoehlenbaer commented 3 years ago

I see the free Cloudscraper module has gotten a minor update today (1.2.52). However, no idea if it is related to the latest Cloudflare shenanigans.

P.S. Getting the same error as mentioned above as well.

Edocsil commented 3 years ago

I am also getting this error. I'm not sure how to read this as anything other than saying that there is now a paid version of FanFicFare that you must buy in order to properly use FanFiction.net. Please, prove me wrong, because I don't see how else you're supposed to interpret that message.

The problem has nothing to do with fanficfare, furthermore it's not new.

As can be seen in #616 , fanfiction.net has put security measures to prevent third party apps from accessing the site. While using Cloudscrapper works most of the time, sometimes Cloudfare request a captcha code, and that can't be resolved with the open source version of cloudscrapper that fanficfare is using.

Personally I have never had this issue, and I'm a firm believer that the reason is my high wait time between downloads.

[overrides]
slow_down_sleep_time:10
themaster567 commented 3 years ago

In any case there is no paid version of FanFicFare!!

Thank you very much! Like I said, I wanted to be wrong.

Twilight666 commented 3 years ago

I tried to add slow_down_sleep_time:10 to my overrides. Nothing changed.

Might even be because it's stuck in captcha code mode...

Twilight666 commented 3 years ago

Found this:

Cloudflare modifies their anti-bot protection page occasionally, So far it has changed maybe once per year on average. If you notice that the anti-bot page has changed, or if this module suddenly stops working, please create a GitHub issue so that I can update the code accordingly.

here

https://github.com/VeNoMouS/cloudscraper

So it might be that. But I can't seem to be able to open an issue there

Also it looks to have updated 8 hours ago so....

mcepl commented 3 years ago

Found this:

If you notice that the anti-bot page has changed, or if this module suddenly stops working, please create a GitHub issue so that I can update the code accordingly.

Except cloudscraper has no paid version and it has switched off issue tracker. (the error message is about the paid version of cloudscraper, not fanficfare)

JimmXinu commented 3 years ago

I am seeing this issue also. I've tried updating to the latest version of cloudscraper without success.

There are a few avenues of exploration open, which I will be looking into.

JimmXinu commented 3 years ago

For reference: #614 was the last issue about ffnet and cloudflare.

Here are the avenues I have explored, none have worked so far :

The exception suggests the existence of a pay or closed source version of cloudscraper, but I'm not finding it. Nor do I know if we could use it.

It is possible that the restrictions will relax over time. It is also possible that they will not. Other downloaders appear to be having the same problems.

dutchmega commented 3 years ago

According to Cloudflare a HTTP 429 response includes a Retry-After header so it would likely be possible to simply wait and retry the request later.

cloudscraper also seems to give the error based on 429 or 503 response: https://github.com/VeNoMouS/cloudscraper/blob/f495f5c3b3f3bf0b0bae5ebd0746704c365adaf0/cloudscraper/__init__.py#L449

Mandabar commented 3 years ago

The [paid] version of cloudscraper may just be the [3rd Party Captcha Solvers] section of the code, where an online service solves the captcha. Requiring you to have an account, and a balance and insert your account api key in. Ugh.

Found in https://github.com/VeNoMouS/cloudscraper#readme under [3rd Party Captcha Solvers].

Now I don't know much about how the code in this is working, but unlike a previous poster, I don't get the cloudflare checking browser page while FFF is unable to work. I don't suppose there is a way to force our plugin to use the already authorized cookies of our working browser. Hmm, then again which browser would it try to grab from is the question even if a thing. ^^ And Jimm already tried this of course. ^^

Are we going to have to have someone make an OCR captcha reading plugin to simulate one of these paid websites? fudge

atroly commented 3 years ago

I would really like to know why the Fanfiction admins appear to be so bitterly opposed to the concept of reading their content on an e-book rather than exclusively online. With any luck their obnoxious attitude in this and many other areas will drive even more authors to less authoritarian platforms.

chocolatechipcats commented 3 years ago

The exception suggests the existence of a pay or closed source version of cloudscraper, but I'm not finding it. Nor do I know if we could use it.

I did a bit of research to. I've seen no evidence that such a thing even exists, aside from a vague comment about a Discord support server.

I would really like to know why the Fanfiction admins appear to be so bitterly opposed to the concept of reading their content on an e-book rather than exclusively online. With any luck their obnoxious attitude in this and many other areas will drive even more authors to less authoritarian platforms.

Ads.

I've been PMing authors I enjoy the works of and encouraging them to move to AO3, if they haven't already. I can't explain it, I have a weird itch that there might be another ffnet purge soon. I would suggest anybody here do the same with their favourite authors.

JimmXinu commented 3 years ago
  • Use captcha service in cloudscraper - Doesn't get that far.

While I didn't go so far as to actually set up an account or anything, I did try configuring one with bogus values and looked at the code. From all I can tell, the exception happens in a check before it even tries contacting the captcha service.

The page I get from CF says This process is automatic. Your browser will redirect to your requested content shortly. Not any sort of 'click here'. Which gives me some hope that cloudscraper may be able to address it. If they choose to do so.

kevinclin commented 3 years ago

The [paid] version of cloudscraper may just be the [3rd Party Captcha Solvers] section of the code, where an online service solves the captcha. Requiring you to have an account, and a balance and insert your account api key in. Ugh.

Found in https://github.com/VeNoMouS/cloudscraper#readme under [3rd Party Captcha Solvers].

Now I don't know much about how the code in this is working, but unlike a previous poster, I don't get the cloudflare checking browser page while FFF is unable to work. I don't suppose there is a way to force our plugin to use the already authorized cookies of our working browser. Hmm, then again which browser would it try to grab from is the question even if a thing. ^^ And Jimm already tried this of course. ^^

Are we going to have to have someone make an OCR captcha reading plugin to simulate one of these paid websites? fudge

This is what I noticed too from reading the docs here: https://pypi.org/project/cloudscraper/. There's a couple of captcha solvers such as 2captcha that are supported.

chocolatechipcats commented 3 years ago

Now I don't know much about how the code in this is working, but unlike a previous poster, I don't get the cloudflare checking browser page while FFF is unable to work.

I checked ffnet with two seperate browsers. No "checking browser..." message, but FFF fails.

Mandabar commented 3 years ago

This is what I noticed too from reading the docs here: https://pypi.org/project/cloudscraper/. There's a couple of captcha solvers such as 2captcha that are supported.

Seems Jimm responded that he already did try the Captcha Service. I'm going to give a whirl of changing my ip address in a moment, doubt it will help.

Update: Confirmed, changed my router's Public/WAN IP with no change. Still no cloudflare challenge on browser, failure with one story on FFF for fanfiction.net

chocolatechipcats commented 3 years ago

I've found mention here: https://github.com/Anorov/cloudflare-scrape/issues/406

Some folks moved over to https://github.com/VeNoMouS/cloudscraper/ but I believe that's moved over to a subscription model where you have to contact the owner over discord? and the owner has disabled issues, and shuts down any comments on PRs about functionality.

And then VeNoMouS posts....uhh, you'll see.

chocolatechipcats commented 3 years ago

This is what I noticed too from reading the docs here: https://pypi.org/project/cloudscraper/. There's a couple of captcha solvers such as 2captcha that are supported.

Seems Jimm responded that he already did try the Captcha Service. I'm going to give a whirl of changing my ip address in a moment, doubt it will help.

It didn't for me. :(

chocolatechipcats commented 3 years ago

Personally I have never had this issue, and I'm a firm believer that the reason is my high wait time between downloads.

[overrides]
slow_down_sleep_time:10

Not necessarily. I have a sleep_time of 8 and I got the error starting last night.

sidney commented 3 years ago

I see that the npm (Node.js) port of cloudscraper at https://www.npmjs.com/package/cloudscraper talks about reCAPTCHA pages, but I think from a quick look at the example code they link to that it just shows how to insert a call to a captcha solver that you have to provide, i.e., the same use of third party services that you already have in the python module. But it might be worth looking at and trying the npm version to see if it does have more success with fanfiction.net, just in case. From what I'm reading that project has diverged since it initially was ported from the python implementation. Even if it isn't practical to use the Node.js version of cloudscraper directly, it would be useful to find out whether or not that code handles the problem.

Edit update - Oh, I just followed chocolatechipcat's link to the comment at cloudflare-scrape, and I see that the npm cloudscraper I mentioned hasn't been updated for a year. In fact I went to its github repo page and it says in big letters that the library is no longer supported and is deprecated. Oh, well.

chocolatechipcats commented 3 years ago

The comment I linked to also mentions this, which looks to be in active development: https://github.com/FlareSolverr/FlareSolverr

FlareSolverr starts a proxy server and it waits for user requests in an idle state using few resources. When some request arrives, it uses puppeteer with the stealth plugin to create a headless browser (Chrome). It opens the URL with user parameters and waits until the Cloudflare challenge is solved (or timeout). The HTML code and the cookies are sent back to the user, and those cookies can be used to bypass Cloudflare using other HTTP clients.

NOTE: Web browsers consume a lot of memory. If you are running FlareSolverr on a machine with few RAM, do not make many requests at once. With each request a new browser is launched.

This might be...problematic for something like FFF, especially if relegating updates to a background task

taskvalanche commented 3 years ago

I tried some experiments myself, and I think it may be related to browser local storage. I tried to get CloudFlare to trigger the heavier check screens in my various browsers, no luck. It worked just fine. Clearing the CloudFlare cookie just reset it, no problem. There's also a session duration browser local storage item containing such keys as "reputation" - I cleared it, disabled the browser local storage and CloudFlare started throwing a fit, checks on every page. Local storage back on and everything is normal again. This looks like CloudFlare's protection is dialed up to stupid extreme on this site. Not sure how this can be worked around.

chocolatechipcats commented 3 years ago

I mentioned to Jim in a PM on MobileRead that it might be worth seeing if the two Cloudflare cookies (__cf_bm and __cfduid) could be used, but it didn't work.

FictionPress completed another migration yesterday, and it was around that time that people start reporting errors. So I'm guessing they bumped the CloudFlare protections up.

The standalone FanFictionDownloader mentions in this bug that "I think I've resolved the issue." Unfortunately, the thing is closed-source.

sidney commented 3 years ago

I tried some experiments myself, and I think it may be related to browser local storage

Maybe getting the Node.js to use a mock localstorage would help? See https://medium.com/javascript-in-plain-english/libraries-for-using-localstorage-in-your-node-js-project-3ff5ac1a3512 for three libraries that implement that. If I'm reading it right, it could be as simple as importing the library so that the calls to localstorage functions in the javascript that Cloudflare uses don't fail, maybe not even having to run any different code.

taskvalanche commented 3 years ago

The standalone FanFictionDownloader mentions in this bug that "I think I've resolved the issue." Unfortunately, the thing is closed-source.

So I grabbed FanFictionDownloader and gave it a few runs to see if I can figure out how they're doing it. It's not completely reliable either, but by poking at it for a bit I think it's actually using a headless browser to actually load the site and pull the HTML content out from that. Something like that could probably be implemented but it'd be a huge mess of dependencies.

chocolatechipcats commented 3 years ago

Maybe getting the Node.js to use a mock localstorage would help?

can Calibre even handle Node.js, though?

So I grabbed FanFictionDownloader and gave it a few runs to see if I can figure out how they're doing it. It's not completely reliable either, but by poking at it for a bit I think it's actually using a headless browser to actually load the site and pull the HTML content out from that. Something like that could probably be implemented but it'd be a huge mess of dependencies.

Interesting. I did mention the headless browser thing above, but aside from the whole "oh god please no more dependencies" things apparently it wouldn't be ideal for updating multiple fics as it opens a new browser instance for every single fic.

mcepl commented 3 years ago

I would really like to know why the Fanfiction admins appear to be so bitterly opposed to the concept of reading their content on an e-book rather than exclusively online. With any luck their obnoxious attitude in this and many other areas will drive even more authors to less authoritarian platforms.

I think it is completely silly, quite resembling legendary Leistungsschutzrecht, because they so much want to avoid anybody missing their beautiful advertisements, that they shut out even all people who effectively advertise for them. For example, bots on reddit (collecting information about fanfiction stories, if you write e.g., linkffn(11062798)) are shutdown, which makes using FFnet so bothersome, that I have immediately pulled most of my stories from there and finally moved to AO3.

sidney commented 3 years ago

can Calibre even handle Node.js, though?

Maybe? I misremembered something I read on the cloudscraper page. I thought it said it uses Node.js, but looking again I see that it says it says "cloudscraper requires a JavaScript Engine/interpreter" and that one of the ones it supports is Node.js. But what that might mean is that it can be configured to use Node.js instead of the default native Python one, and that the Node.js could import one of the drop-in replacement localstorage libraries. I'm not set up to try any of that, but if you can look at how cloudscraper is installed to see if it is possible, might be worth a try.

chocolatechipcats commented 3 years ago

For example, bots on reddit (collecting information about fanfiction stories, if you write e.g., linkffn(11062798)) are shutdown, which makes using FFnet so bothersome, that I have immediately pulled most of my stories from there and finally moved to AO3.

I'm a fanfiction author too, and while I haven't pulled my stories off FFNet, I've switched over to AO3 years ago because I prefer their policies. The purges they did a few years ago kind of soured me on them, as well as the anti-text copy and specifically blocking Wayback Machine from archiving them (the IA changed their policies to ignore robots.txt, thankfully).

sidney commented 3 years ago

It doesn't look so good for using Node.js to help. From an error message I found in the cloudscraper code that is run when it is configured to use Node.js as the javascript interpreter:

Missing Node.js runtime. Node is required and must be in the PATH (check withnode -v)

It apparently requires the host machine to have the node.js binary installed and in the PATH. Not useful for something distributed as a Calibre plugin.

mcepl commented 3 years ago

I'm a fanfiction author too, and while I haven't pulled my stories off FFNet, I've switched over to AO3 years ago because I prefer their policies.

I still cannot believe that there are some people who actually write their stories in the comment boxes of their browsers. That idea (if there are really such people) makes me really idea, because FFnet decided that they are owners of those works not only stewards of somebody else’s content.

However, we should really stop this discussion here, it doesn’t help the Python program at all.

sidney commented 3 years ago

I looked at the cloudscraper source code for its default native python implementation of "JavaScript interpreter" and it does not bode well for something that one has to count on continuing to work in the face of whatever Cloudflare changes.

Cloudflare version 1 challenge seems to be a piece of javascript that calculates an arithmetic expression with numbers, +, -, *, /, and parentheses. The default code in cloudscraper does not include a general Javascript interpreter, it simply expects the specific javascript formula format that Cloudscript uses, extracts the arithmetic formula, parses it with regular expressions and calculates the result. Cloudflare only wants proof that the client can evaluate Javascript like a real browser would. They are free to change the challenge arbitrarily anytime.

The only solution really does seem like a real Javascript interpreter, and now apparently one that includes the localstorage api functions.

chocolatechipcats commented 3 years ago

I'm a fanfiction author too, and while I haven't pulled my stories off FFNet, I've switched over to AO3 years ago because I prefer their policies.

I still cannot believe that there are some people who actually write their stories in the comment boxes of their browsers. That idea (if there are really such people) makes me really idea, because FFnet decided that they are owners of those works not only stewards of somebody else’s content.

However, we should really stop this discussion here, it doesn’t help the Python program at all.

Understood. Here's some more thoughts on it, though, if you have a MR account and would like to continue the discussion: https://www.mobileread.com/forums/showpost.php?p=4079355&postcount=5139

VeNoMouS commented 3 years ago

To clear things...

The reason you are getting the following error from cloudscraper

Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

is because the site you are requesting against is serving a Cloudflare version 2 challenge (which is a multi stage, multi challenge.. challenge)... not the simple version 1 challenge known as jsfuck...

The reason for the subscription model, I got completely sick of all the people using my open source work to profit off, and constantly demanding i look into their personal issues not being able to scrape something ie sneakers etc...

so... I went the subscription model route in order to solve version 2 challenges, where the user pays for support and the ability to solve the challenges... and said subscription also pays for improvements on the library itself, that way i get ROI on my time effort stress

Honestly I don't care if you don't like that I went the paid subscription model, I wrote the product, I own the code, its my decision , I don't owe the community anything, and I've gotten enough abuse from people whinging that I should give my work away for free so they can profit off it without giving anything back in return.

so sorry, but not sorry...

image

That said, anyone is welcome to come contact me via discord and pay for a subscription.

chocolatechipcats commented 3 years ago

Honestly I don't care if you don't like that I went the paid subscription model, I wrote the product, I own the code, its my decision , I don't owe the community anything, and I've gotten enough abuse from people whinging that I should give my work away for free so they can profit off it without giving anything back in return.

tbf, the whinging is more "But where the hell is the paid version?" There's zero mention of it anywhere in the Cloudscraper documentation.

themaster567 commented 3 years ago

Exactly. The error is so ambiguous that I thought that it was something in FanFicFare. If you're going to charge for it, whatever, but at least make it more obvious.

JimmXinu commented 3 years ago

@VeNoMouS, thank you for the explanation.

I started using cloudscraper in FFF because it solved the problem and is publicly posted to github under MIT license. It is, of course, your right to not provide public updates.

Now that we know that, I encourage FFF users to not badger @VeNoMouS about it further.

chocolatechipcats commented 3 years ago

So now we have clarification on what's going on with Cloudscraper.

FFnet jumping up Cloudflare a level from medium to high, for instance or even the "under attack" mode would match the sudden increases in version 2 challenges. There's no indication of whether this'll remain in place or not, but FictionPress posted yesterday that they recently completed a migration.

FanFictionDownloader (the standalone, not the old version of FFF) was able to fix it, but the method seems different from FFF's and might be difficult to implement.

I'm unsure of what other downloaders are available or if they've been able to resolve it either.

JimmXinu commented 3 years ago

Correct.

It is possible that these levels may come down and FFF will work with ffnet again.

OTOH, it is also possible that they will not; or even that FFF may not be able to download from ffnet in future.

I will continue to investigate and consider options, but it may be a while before we know anything for sure.

chocolatechipcats commented 3 years ago

Some users on Twitter have mentioned CAPTCHA issues. It's possible that they turned on "under attack" mode, though they made no mention on Twitter of any more DDoS issues since last month.

FFNet's also been kind of breaking down recently. From what I've heard PM notifications are still broken, and they also managed to recently break story stats and the forums. My RSS feeds are glitching out too. Sometimes I wonder if the people running that site even have any idea what they're doing.

VeNoMouS commented 3 years ago

Exactly. The error is so ambiguous that I thought that it was something in FanFicFare. If you're going to charge for it, whatever, but at least make it more obvious.

how would you word the raised exception... without making it too large?

Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

^^ is a short, and to the point error message.

chocolatechipcats commented 3 years ago

Here's probably what an average user would do:

  1. Google for a paid version of Cloudscraper. Find nothing (or perhaps, in the future, this thread....and be put off by your response above.)
  2. Go to the Cloudscraper's Github page and search for the error. There is no mention of a paid or subscription version on it.
  3. Try to raise an issue...if they weren't closed.

No mention of a paid version. A user might eventually go back to your profile, and they might notice the Discord username there...if they even have Discord in the first place. Other than that, how would you expect them to interpret a "not available in the free version" when the paid version doesn't seem to exist?

If you don't believe me, read the comments to this Stack Overflow thread: https://stackoverflow.com/questions/64433684/cloudscraper-issue-cloud-flare-version-2-in-scraping-website

This is the last I'm going to say on it.

VeNoMouS commented 3 years ago

One of the core reasons I have it that way, that joe public cant just access it, is because i got really sick of the arms race with cloudflare, they were actively monitoring my repo, and constantly battling with them to counter their antibot techniques was getting tiresome... the CEO of cloudflare directly linked my repo on twitter... so i went dark

sidney commented 3 years ago

how would you word the raised exception... without making it too large?

This one is just two characters longer and fixes the main problem that the user thinks that the message is from Calibre or the FanFicFare plugin (or whatever application embeds the library), implying that there is a non-free version of Calibre or FanFicFare that solves the problem. Once it is clear that the error is from cloudscraper, the user is more likely to understand that a free open source application isn't going to include a paid version of a required library. I can see why you don't want to link to the paid version of the library, but not having that link is not a problem for end -users of a free application.

Detected Cloudflare ver 2 challenge, not supported by free cloudscraper library used by this application.

Getting back to the issue here on FanFicFare, I think it is now pretty clear that there is not going to be a satisfactory solution to automated access to fanfiction.net as long as they prefer a policy of not allowing it and are buying Cloudflare's services to prevent it. The version 2 challenges are in the realm of arbitrary javascript code designed to beat workarounds like cloudscraper and under ongoing development in an arms race. I took a look at what Cloudflare sends for the challenge and it won't be as simple as running a full Javascript interpreter even with local storage. And if it were, it might not be tomorrow. That's not an arms race developers of a free open source program would want to be in.

kido5217 commented 3 years ago

It sounds like an overkill, but maybe we can use selenium?

This code works for me:

#!/usr/bin/env python3

from time import sleep

from selenium.webdriver.firefox.options import Options
from selenium.webdriver import Firefox

FFULR = 'https://www.fanfiction.net/s/13586946/1/Sons-and-Daughters-of-Sineya'

opts = Options()
opts.set_headless()
assert opts.headless  # Operating in headless mode

browser = Firefox(options=opts)
browser.get(FFULR)

sleep(5)
print(browser.page_source)