Closed Twilight666 closed 3 years ago
Yeah, it happens now quite often (NOT always). I have a script to update all my books (and I am very careful not to overload ffn server, so I have even -o slow_down_sleep_time=6
to be on the safe side), but quite many of them ends with this error.
I think the CloudFlare service is to blame.
I regularly use a little python script to check my private messages in FFmet and today started to throw 403 errors back at me. I managed to get it to work again after I imported the cloudscraper module and replaced the r = requests.get(url, cookies=cookies)
call with scraper = cloudscraper.create_scraper()
and r = scraper.get(url, cookies=cookies)
I'm seeing this too. With the small difference that I'm seein HTTP Error 503: Service Temporarily Unavailable
from the Calibre plugin instead of 403
.
In plugin, I can sometimes get the first chapter URL successfully before failing on the second. (with check_next_chapter:false
).
Setting user_agent
doesn't help.
At this point, I don't have any bright ideas of my own. I will be looking at @Wasuregusa's suggestion.
I am having the same problem Error Unknown Unknown HTTP Error 403: Forbidden https://www.fanfiction.net/s/13576158/1/
There is no problem with updating AO3
Mine was working fine up until today and alot of the stories i have are getting this 403 error
FYI, the cloudscraper module relies on requests
, which is well known and widely used... but not by Calibre, which bundles mechanize instead. And FFF uses Python's urllib
directly to avoid needing to require either.
If it really is a cloudflare issue, it's possible that it may go away after a while. We've seen that happen with other sites behind cloudflare.
ownedbycats from MobileRead - I looked. FictionPress seems to be migrating at the moment: https://twitter.com/FictionPress
Hopefully the issues will clear up when they are finished.
EDIT: They also were ddosed a few days ago.
In that case, I'm not going to try and quick fix this while it's a moving target.
For one thing, while testing to see if cloudscraper
is feasible, it looks like there will be adapter breaking change in the site HTML in addition to connection issues.
An extremely quick and ugly change to use cloudscraper
appears to work in the CLI--except when it bumps into unexpected (and inconsistent) HTML changes.
This is not a solution I'm thrilled with because:
cloudscraper
is Python3 only, so only Calibre 5+ could be supported.not sure if this helps but loading the mobile pages of ff net allows copy paste for chapters. and did not screwup when copying a chapter for https://m.fanfiction.net/s/13497453/59/God-Slaying-Blade-Works-The-King-of-Wrought-Iron so this isn't ideal but itll work until fff's kinks regarding ff.net atm can be managed.
@kyoam I tried using the mobile version of the url to download a story and it still gives the same result
I believe kyoam is trying to say that the mobile version lets you copy-paste and manually add the chapter. The main site disallows selecting text (though there's userscripts to bypass it).
setting user_agent:Mozilla/5.0 allowed me to successfully update a fic!
Tried it:
[fanfiction.net] never_make_cover: true user_agent:Mozilla/5.0
Didn't work either in CLI or Calibre
Can confirm UA spoofing fixed things.
@hseg can you tell me what you did differently than me
Just setting a user_agent
doesn't work for me either. That it does work for some users is encouraging, however.
Setting the user agent doesn't work fully for me. It allowed me to update a single story, and allows downloading to begin instead of failing instantly, but adding a new story failed and now I can't update further stories either. Looks like CloudFlare may be detecting the spoofing.
Nope, user_agent
doesn’t help. I am afraid, that the other people didn’t do proper testing. When trying to update directory with 122 EPubs (with -o slow_down_sleep_time=6
firmly in place), it updated many, but at some point it got to the point it throws 503: Service Temporarily Unavailable
errors at me).
Just set -o user_agent=Mozilla/5.0
, though only had a dozen or so ffn epubs. Possibly this spoofing doesn't scale.
On December 20, 2020 4:43:00 PM GMT+02:00, Twilight666 notifications@github.com wrote:
@hseg can you tell me what you did differently than me
This is very weird. With the user agent changed on my personal.ini I get a 403, but forcing it with the -o parameter it gives a 503. The exact same user agent and the exact same url, and it is consistent on every single attempt.
Does the user agent need to be added to a specific category for it to work?
[defaults]
is_adult:true
always_overwrite:true
slow_down_sleep_time:10
user_agent:Mozilla/5.0
defaults.ini
already has:
[www.fanfiction.net]
user_agent:
...which overrides personal.ini [defaults]
. -o
is equivalent to putting it in [overrides]
.
See the precedence rules
I might be inferring things that aren't there, but this tweet makes me think that they intend to leave the current Cloudflare level in place: https://twitter.com/FictionPress/status/1340504620343250946
uh guys https://fanfictiondownloader.net the fanfiction downloader program just released an update that gets dl access back. just tried a link. it works.
That's good to know. I don't think it's open-sourced though so finding out what changes were made might be tricky.
Here's the issue on their bug-tracker, at least: https://bug.fanfictiondownloader.net/view.php?id=61
okay quick update on ffdl it does occaisionally come back with no result or missed chapters for fics with chapter counts over 50 for epub resulting files but retying eventually gets it to dl in full. rtf outputs seen to have less problems no matter the ch count.
FYI: Other than definitely confirming that it is CloudFlare getting in the way, I'm at the same point I was yesterday. That is, needing cloudscraper
to get past CloudFlare. Which does appear to work when ugly-hacked into CLI. But presents large difficulties for calibre plugin.
I may put up a CLI test version tonight or tomorrow.
And BTW, discussion of the other fanfictiondownloader (that I changed name of this project to avoid) is off-topic here. A mention is fine; for further discussion, they have their own board.
You don't edit defaults.ini - you edit personal.ini from FFF menu > Configure FanFicFare > personal.ini. I also bumped up the sleep time a bit.
[www.fanfiction.net]
user_agent:Mozilla/5.0
slow_down_sleep_time:8
Hmm.. did that (got an error message about some bad section headings, so just deleted those since they reference sites I don't use anyway). I retried and it didn't help. I restarted Calibre after making the change, but it didn't make any difference. I get a 503 error.
Yes, that is because ffnet changed and we haven't addressed the changes yet.
There isn't anything you can do right now to make FanFicFare reliably work with ffnet.
I have a suspicion that the differences in results as to why some people are getting errors and others aren't are due to ISP differences. From what I remember, Cloudflare can tamp down more on the more problematic ISPs/IP ranges that are known for botnets, though I'm a little fuzzy on the details.
I've seen posts on twitter about users from certain countries having more trouble accessing the site, so I suspect this might be in place.
We have seen intermittent Cloudflare issues before with other sites, for example: #488
So it is at least possible that the issue will go away on it's own after a while.
This is also the first time I've heard of a reasonable solution (cloudscraper
).
Could just be timing or could add weight to the ISP theory.
I tried several of the mentioned fixes this morning and was getting 503 errors. I tried later in the afternoon using my hotspot instead of DSL and it worked.
I have the user agent set to the full version of chrome I am using now and slow down at 10.
Opening a story page in a text browser such as w3m or lynx shows a Cloudflare captcha prompt. This is pretty clearly Cloudflare itself monkeying with connections. Redacted output follows.
Please enable cookies.
One more step
Please complete the security check to access www.fanfiction.net
Please stand by, while we are checking your browser...
Please turn JavaScript on and reload the page.
Please enable Cookies and reload the page.
Why do I have to complete a CAPTCHA?
Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.
What can I do to prevent this in the future?
If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.
If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or
infected devices.
Cloudflare Ray ID: deadbeef10101 • Your IP: 127.0.0.1 • Performance & security by Cloudflare
Today I've managed to update over 30 stories that had piled up in my mailbox, without touching the settings I had yesterday which gave 503 errors.
By the way, I've yet to see a cloudfare loading page when checking the website.
I don't know if it's any help but yesterday (I'm in Europe so like 15 to 19 hours ago) I tried to change my user agent to mozilla. It worked, mostly and I downloaded some stories (without setting up the slow down code), I still had one or two 403, but that's all. But some time later it stopped working, so I tried to change the user agent again this time to chrome, it didn't work, I ended up with 403 errors and a few 503. Since then I haven't been able to download anything from ffnet (even with the slow down code).
A new-ish tweet from FictionPress - do they regard FanFicFare as "white-hat"?
API News: We will be working with a few curated white-hat bot makers that have created tools to assist users to find & follow interesting stories published on our sites. An API program will be announced soon and an alpha version will be ready by first week of 2021.
FFF returned 503 errors for me when I tried updating two of my fics.
Also, I noticed that the RSS feed is having some problems too (displays updates, but with the wrong dates and chapter counts). The updates appear on the site proper, just the RSS feed is wrong.
A new-ish tweet from FictionPress - do they regard FanFicFare as "white-hat"?
API News: We will be working with a few curated white-hat bot makers that have created tools to assist users to find & follow interesting stories published on our sites. An API program will be announced soon and an alpha version will be ready by first week of 2021.
I think that downloaders such as FFF could be considered against FFnet's terms of service (section 4.C). So I doubt it.
I have uploaded a test version of the CLI only that requires and uses cloudscraper
to download from ffnet. It works for me while the old code does not.
This is a configurable change using the a new INI setting use_cloudscraper
. defaults.ini
for CLI in this test version includes use_cloudscraper:true
under [www.fanfiction.net]
, so you don't need to add it yourself to personal.ini
. But you can set it false
in personal.ini
to use the old code for comparison. And yes, it can also be used with other sites, but I haven't tested that very extensively.
Again, at this point, it's CLI only. It works for me, but I haven't run any longer download tests yet.
Install using:
pip install --extra-index-url https://testpypi.python.org/pypi --upgrade FanFicFare
I installed the latest test verion (Windows 10, Python3)
I am getting an error
DEPRECATION: The OpenSSL being used by this python install (OpenSSL 1.1.0j 20 Nov 2018) does not meet the minimum supported version (>= OpenSSL 1.1.1) in order to support TLS 1.3 required by Cloudflare, You may encounter an unexpected reCaptcha or cloudflare 1020 blocks.
And when I tried pip.exe install --upgrade OpenSSL
I get
ERROR: Could not find a version that satisfies the requirement OpenSSL ERROR: No matching distribution found for OpenSSL
Any ideas?
EDIT: Actually it is downloading the stories anyway so.... EDIT2: My version of Python was 3.7.2. Downloaded 3.7.8 and upgraded it. Message is gone
With latest test CLI got
cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.
error for two tested links - https://www.fanfiction.net/s/12515214/113/An-Essence-of-Silver-and-Steel and https://www.fanfiction.net/s/13510736/16/Spells-in-Silence
Checked using wire and mobile networks and also with and without vpn. All with same result
UPDATE: Adding Chrome/Mozilla as user agent instead of default one and some minor delay between chapters solved error above for me - all backlog FF fiction was successfully downloaded.
With latest test CLI got
cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.
Odd, this error indicates that there's a paid version of the scraper somewhere, but it doesn't seem to exist: https://stackoverflow.com/questions/64433684/cloudscraper-issue-cloud-flare-version-2-in-scraping-website
I upgraded to the test CLI release on Debian 10 (Python 3.7.3) and with cloudscraper it appears to be working fine, I've done a couple of stories, one of which is quite long (over 100 chapters) and no issues at all. It's back to performing exactly how it used to. Looks like cloudscraper did it, at least for me.
Interesting that judging by the comments above some people are getting a full captcha. I tried those stories linked in the comment above and they worked. It looks like that may be a deliberate configuration on the part of FFnet if we can possibly determine the CloudFlare behavior is as described here: https://support.cloudflare.com/hc/en-us/articles/200170136-Understanding-Cloudflare-Challenge-Passage-Captcha-
Having trouble testing, can you upload to a branch here instead? In particular, uploaded source seems very different from HEAD, so I can't really compare the two. Github doesn't seem to be aware of another clone with all these changes, too.
Interesting that judging by the comments above some people are getting a full captcha. I tried those stories linked in the comment above and they worked. It looks like that may be a deliberate configuration on the part of FFnet if we can possibly determine the CloudFlare behavior is as described here: https://support.cloudflare.com/hc/en-us/articles/200170136-Understanding-Cloudflare-Challenge-Passage-Captcha-
Note that "suspicious IP activity" can happen if your ISP assigns you an address previously used by a bad actor. Happened for me once, was rather hellish getting CAPTCHAs on near every site until I finally reset the modem.
Odd, this error indicates that there's a paid version of the scraper somewhere
It probably means a paid captcha solver instead of a paid scraper. There are options listed on the site for those who want to try it out: https://github.com/VeNoMouS/cloudscraper#3rd-party-captcha-solvers
Any chance we can get an option to select one of those and enter our credentials for them too in the off-chance that this error becomes more frequent?
Re: Captchas, I was aware that that code was in cloudscraper, but I haven't given it any further thought since I wasn't seeing it. To my understanding, Cloudflare inflicts varying levels of obstruction depending on a number of factors--I suspect the site gets to set at least some of those factors.
@hseg, I haven't pushed the branch I'm working on into GitHub. Considering pip downloads it as source anyway, it's not a priority as far as I'm concerned. And the commit history is embarrassing ugly.
@darthShadow, I didn't even know 'paid captcha solver' was a thing. I will consider it if it becomes a large enough issue. Ditto using different JS engines, maybe? Not sure it matters...
For now, v3.26.2 seems to be working for most people trying it and my next priority will seeing if I can shoe-horn it into the Calibre plugin.
The testing package for my distro (Archlinux) fetches from git HEAD. I can easily retarget it to fetch from a branch elsewhere, but making it fetch from pip is annoying enough not to bother with at this hour. Am willing to accept history-instability of a testing branch. But if you don't want your dirty code out there, that's fine too.
I also noticed this from the cloudscraper page:
It's easy to integrate cloudscraper with other applications and tools. Cloudflare uses two cookies as tokens: one to verify you made it past their challenge page and one to track your session. To bypass the challenge page, simply include both of these cookies (with the appropriate user-agent) in all HTTP requests you make. To retrieve just the cookies (as a dictionary), use cloudscraper.get_tokens(). To retrieve them as a full Cookie HTTP header, use cloudscraper.get_cookie_string(). get_tokens and get_cookie_string both accept Requests' usual keyword arguments (like get_tokens(url, proxies={"http": "socks5://localhost:9050"})). Please read Requests' documentation on request arguments for more information.
Could this be used? Token could possibly be copied from a browser session where ffnet was successfully accessed - I did something similar once with wget to get some stuff from behind a login.
Required the update to cloudscraper and spoofing the user agent to Mozilla/5.0 in order to work for me.
On Mon, Dec 21, 2020 at 8:52 PM chocolatechipcats notifications@github.com wrote:
I also noticed this from the cloudscraper page:
It's easy to integrate cloudscraper with other applications and tools. Cloudflare uses two cookies as tokens: one to verify you made it past their challenge page and one to track your session. To bypass the challenge page, simply include both of these cookies (with the appropriate user-agent) in all HTTP requests you make. To retrieve just the cookies (as a dictionary), use cloudscraper.get_tokens(). To retrieve them as a full Cookie HTTP header, use cloudscraper.get_cookie_string(). get_tokens and get_cookie_string both accept Requests' usual keyword arguments (like get_tokens(url, proxies={"http": "socks5://localhost:9050"})). Please read Requests' documentation on request arguments for more information.
Could this be used?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JimmXinu/FanFicFare/issues/614#issuecomment-749292535, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMW5G2ZL35KRKGXHPBS223SV73WZANCNFSM4VCKKNMQ .
@chocolatechipcats, To do what? The cookies have to be obtained first. And when cloudscraper
says 'with the appropriate user-agent', keep in mind that cloudscraper
has several thousand user-agent strings it uses.
I'm also going to look at not explicitly setting user-agent and letting cloudscraper
do it's thing with user-agent.
I tried to download FFnet stories and I had a 403 error (AO3 and RoyalRoad work)
I also tried with Calibre and the error I get is
HTTP Error 403: Forbidden
, so basically the same issue.I can view the story with my browser
Some examples