JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
739 stars 156 forks source link

FFnet stories don't work #614

Closed Twilight666 closed 3 years ago

Twilight666 commented 3 years ago

I tried to download FFnet stories and I had a 403 error (AO3 and RoyalRoad work)

Traceback (most recent call last): File "c:\python3.7\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "c:\python3.7\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "c:\Python3.7\Scripts\fanficfare.exe__main__.py", line 7, in File "c:\python3.7\lib\site-packages\fanficfare\cli.py", line 327, in main passed_personalini) File "c:\python3.7\lib\site-packages\fanficfare\cli.py", line 421, in do_download adapter.getStoryMetadataOnly() File "c:\python3.7\lib\site-packages\fanficfare\adapters\base_adapter.py", line 297, in getStoryMetadataOnly self.doExtractChapterUrlsAndMetadata(get_cover=get_cover) File "c:\python3.7\lib\site-packages\fanficfare\adapters\adapter_fanfictionnet.py", line 111, in doExtractChapterUrlsAndMetadata raise e File "c:\python3.7\lib\site-packages\fanficfare\adapters\adapter_fanfictionnet.py", line 104, in doExtractChapterUrlsAndMetadata data = self._fetchUrl(url) File "c:\python3.7\lib\site-packages\fanficfare\adapters\adapter_fanfictionnet.py", line 85, in _fetchUrl usecache=usecache) File "c:\python3.7\lib\site-packages\fanficfare\configurable.py", line 1359, in _fetchUrl referer=referer)[0] File "c:\python3.7\lib\site-packages\fanficfare\configurable.py", line 1348, in _fetchUrlOpened referer=referer) File "c:\python3.7\lib\site-packages\fanficfare\configurable.py", line 1249, in _fetchUrlOpened raise(excpt) File "c:\python3.7\lib\site-packages\fanficfare\configurable.py", line 1229, in _fetchUrlOpened referer=referer) File "c:\python3.7\lib\site-packages\fanficfare\configurable.py", line 1188, in _fetchUrlRawOpened float(self.getConfig('connect_timeout',30.0))) File "c:\python3.7\lib\urllib\request.py", line 531, in open response = meth(req, response) File "c:\python3.7\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "c:\python3.7\lib\urllib\request.py", line 569, in error return self._call_chain(args) File "c:\python3.7\lib\urllib\request.py", line 503, in _call_chain result = func(args) File "c:\python3.7\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

I also tried with Calibre and the error I get is HTTP Error 403: Forbidden, so basically the same issue.

I can view the story with my browser

Some examples

https://www.fanfiction.net/s/12863738/[134-] https://www.fanfiction.net/s/13716496/ https://www.fanfiction.net/s/13771105/10/Soul-of-Struggle-Soul-of-Defiance-sequel

mcepl commented 3 years ago

Yeah, it happens now quite often (NOT always). I have a script to update all my books (and I am very careful not to overload ffn server, so I have even -o slow_down_sleep_time=6 to be on the safe side), but quite many of them ends with this error.

Wasuregusa commented 3 years ago

I think the CloudFlare service is to blame.

I regularly use a little python script to check my private messages in FFmet and today started to throw 403 errors back at me. I managed to get it to work again after I imported the cloudscraper module and replaced the r = requests.get(url, cookies=cookies) call with scraper = cloudscraper.create_scraper() and r = scraper.get(url, cookies=cookies)

JimmXinu commented 3 years ago

I'm seeing this too. With the small difference that I'm seein HTTP Error 503: Service Temporarily Unavailable from the Calibre plugin instead of 403.

In plugin, I can sometimes get the first chapter URL successfully before failing on the second. (with check_next_chapter:false).

Setting user_agent doesn't help.

At this point, I don't have any bright ideas of my own. I will be looking at @Wasuregusa's suggestion.

roon0 commented 3 years ago

I am having the same problem Error Unknown Unknown HTTP Error 403: Forbidden https://www.fanfiction.net/s/13576158/1/

There is no problem with updating AO3

jklly12 commented 3 years ago

Mine was working fine up until today and alot of the stories i have are getting this 403 error

JimmXinu commented 3 years ago

FYI, the cloudscraper module relies on requests, which is well known and widely used... but not by Calibre, which bundles mechanize instead. And FFF uses Python's urllib directly to avoid needing to require either.

If it really is a cloudflare issue, it's possible that it may go away after a while. We've seen that happen with other sites behind cloudflare.

chocolatechipcats commented 3 years ago

ownedbycats from MobileRead - I looked. FictionPress seems to be migrating at the moment: https://twitter.com/FictionPress

Hopefully the issues will clear up when they are finished.

EDIT: They also were ddosed a few days ago.

JimmXinu commented 3 years ago

In that case, I'm not going to try and quick fix this while it's a moving target.

For one thing, while testing to see if cloudscraper is feasible, it looks like there will be adapter breaking change in the site HTML in addition to connection issues.

An extremely quick and ugly change to use cloudscraper appears to work in the CLI--except when it bumps into unexpected (and inconsistent) HTML changes.

This is not a solution I'm thrilled with because:

  1. I'd have to package at least 3 and probably more additional modules into the plugin, and;
  2. cloudscraper is Python3 only, so only Calibre 5+ could be supported.
kyoam commented 3 years ago

not sure if this helps but loading the mobile pages of ff net allows copy paste for chapters. and did not screwup when copying a chapter for https://m.fanfiction.net/s/13497453/59/God-Slaying-Blade-Works-The-King-of-Wrought-Iron so this isn't ideal but itll work until fff's kinks regarding ff.net atm can be managed.

Twilight666 commented 3 years ago

@kyoam I tried using the mobile version of the url to download a story and it still gives the same result

chocolatechipcats commented 3 years ago

I believe kyoam is trying to say that the mobile version lets you copy-paste and manually add the chapter. The main site disallows selecting text (though there's userscripts to bypass it).

chocolatechipcats commented 3 years ago

setting user_agent:Mozilla/5.0 allowed me to successfully update a fic!

Twilight666 commented 3 years ago

Tried it:

[fanfiction.net] never_make_cover: true user_agent:Mozilla/5.0

Didn't work either in CLI or Calibre

hseg commented 3 years ago

Can confirm UA spoofing fixed things.

Twilight666 commented 3 years ago

@hseg can you tell me what you did differently than me

JimmXinu commented 3 years ago

Just setting a user_agent doesn't work for me either. That it does work for some users is encouraging, however.

atroly commented 3 years ago

Setting the user agent doesn't work fully for me. It allowed me to update a single story, and allows downloading to begin instead of failing instantly, but adding a new story failed and now I can't update further stories either. Looks like CloudFlare may be detecting the spoofing.

mcepl commented 3 years ago

Nope, user_agent doesn’t help. I am afraid, that the other people didn’t do proper testing. When trying to update directory with 122 EPubs (with -o slow_down_sleep_time=6 firmly in place), it updated many, but at some point it got to the point it throws 503: Service Temporarily Unavailable errors at me).

hseg commented 3 years ago

Just set -o user_agent=Mozilla/5.0, though only had a dozen or so ffn epubs. Possibly this spoofing doesn't scale.

On December 20, 2020 4:43:00 PM GMT+02:00, Twilight666 notifications@github.com wrote:

@hseg can you tell me what you did differently than me

Edocsil commented 3 years ago

This is very weird. With the user agent changed on my personal.ini I get a 403, but forcing it with the -o parameter it gives a 503. The exact same user agent and the exact same url, and it is consistent on every single attempt.

Does the user agent need to be added to a specific category for it to work?


[defaults] 
is_adult:true
always_overwrite:true
slow_down_sleep_time:10
user_agent:Mozilla/5.0
JimmXinu commented 3 years ago

defaults.ini already has:

[www.fanfiction.net]
user_agent:

...which overrides personal.ini [defaults]. -o is equivalent to putting it in [overrides].

See the precedence rules

chocolatechipcats commented 3 years ago

I might be inferring things that aren't there, but this tweet makes me think that they intend to leave the current Cloudflare level in place: https://twitter.com/FictionPress/status/1340504620343250946

kyoam commented 3 years ago

uh guys https://fanfictiondownloader.net the fanfiction downloader program just released an update that gets dl access back. just tried a link. it works.

chocolatechipcats commented 3 years ago

That's good to know. I don't think it's open-sourced though so finding out what changes were made might be tricky.

Here's the issue on their bug-tracker, at least: https://bug.fanfictiondownloader.net/view.php?id=61

kyoam commented 3 years ago

okay quick update on ffdl it does occaisionally come back with no result or missed chapters for fics with chapter counts over 50 for epub resulting files but retying eventually gets it to dl in full. rtf outputs seen to have less problems no matter the ch count.

JimmXinu commented 3 years ago

FYI: Other than definitely confirming that it is CloudFlare getting in the way, I'm at the same point I was yesterday. That is, needing cloudscraper to get past CloudFlare. Which does appear to work when ugly-hacked into CLI. But presents large difficulties for calibre plugin.

I may put up a CLI test version tonight or tomorrow.

And BTW, discussion of the other fanfictiondownloader (that I changed name of this project to avoid) is off-topic here. A mention is fine; for further discussion, they have their own board.

chocolatechipcats commented 3 years ago

You don't edit defaults.ini - you edit personal.ini from FFF menu > Configure FanFicFare > personal.ini. I also bumped up the sleep time a bit.

[www.fanfiction.net]
user_agent:Mozilla/5.0
slow_down_sleep_time:8
LoisGNS commented 3 years ago

Hmm.. did that (got an error message about some bad section headings, so just deleted those since they reference sites I don't use anyway). I retried and it didn't help. I restarted Calibre after making the change, but it didn't make any difference. I get a 503 error.

JimmXinu commented 3 years ago

Yes, that is because ffnet changed and we haven't addressed the changes yet.

There isn't anything you can do right now to make FanFicFare reliably work with ffnet.

chocolatechipcats commented 3 years ago

I have a suspicion that the differences in results as to why some people are getting errors and others aren't are due to ISP differences. From what I remember, Cloudflare can tamp down more on the more problematic ISPs/IP ranges that are known for botnets, though I'm a little fuzzy on the details.

I've seen posts on twitter about users from certain countries having more trouble accessing the site, so I suspect this might be in place.

JimmXinu commented 3 years ago

We have seen intermittent Cloudflare issues before with other sites, for example: #488

So it is at least possible that the issue will go away on it's own after a while.

This is also the first time I've heard of a reasonable solution (cloudscraper).

dlehman83 commented 3 years ago

Could just be timing or could add weight to the ISP theory.
I tried several of the mentioned fixes this morning and was getting 503 errors. I tried later in the afternoon using my hotspot instead of DSL and it worked.
I have the user agent set to the full version of chrome I am using now and slow down at 10.

tidux commented 3 years ago

Opening a story page in a text browser such as w3m or lynx shows a Cloudflare captcha prompt. This is pretty clearly Cloudflare itself monkeying with connections. Redacted output follows.

Please enable cookies.

One more step

Please complete the security check to access www.fanfiction.net

Please stand by, while we are checking your browser...

Please turn JavaScript on and reload the page.

Please enable Cookies and reload the page.

Why do I have to complete a CAPTCHA?

Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.

What can I do to prevent this in the future?

If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.

If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or
infected devices.

Cloudflare Ray ID: deadbeef10101 • Your IP: 127.0.0.1 • Performance & security by Cloudflare
Edocsil commented 3 years ago

Today I've managed to update over 30 stories that had piled up in my mailbox, without touching the settings I had yesterday which gave 503 errors.

By the way, I've yet to see a cloudfare loading page when checking the website.

Clem2605 commented 3 years ago

I don't know if it's any help but yesterday (I'm in Europe so like 15 to 19 hours ago) I tried to change my user agent to mozilla. It worked, mostly and I downloaded some stories (without setting up the slow down code), I still had one or two 403, but that's all. But some time later it stopped working, so I tried to change the user agent again this time to chrome, it didn't work, I ended up with 403 errors and a few 503. Since then I haven't been able to download anything from ffnet (even with the slow down code).

atroly commented 3 years ago

A new-ish tweet from FictionPress - do they regard FanFicFare as "white-hat"?

API News: We will be working with a few curated white-hat bot makers that have created tools to assist users to find & follow interesting stories published on our sites. An API program will be announced soon and an alpha version will be ready by first week of 2021.

chocolatechipcats commented 3 years ago

FFF returned 503 errors for me when I tried updating two of my fics.

Also, I noticed that the RSS feed is having some problems too (displays updates, but with the wrong dates and chapter counts). The updates appear on the site proper, just the RSS feed is wrong.

A new-ish tweet from FictionPress - do they regard FanFicFare as "white-hat"?

API News: We will be working with a few curated white-hat bot makers that have created tools to assist users to find & follow interesting stories published on our sites. An API program will be announced soon and an alpha version will be ready by first week of 2021.

I think that downloaders such as FFF could be considered against FFnet's terms of service (section 4.C). So I doubt it.

JimmXinu commented 3 years ago

I have uploaded a test version of the CLI only that requires and uses cloudscraper to download from ffnet. It works for me while the old code does not.

This is a configurable change using the a new INI setting use_cloudscraper. defaults.ini for CLI in this test version includes use_cloudscraper:true under [www.fanfiction.net], so you don't need to add it yourself to personal.ini. But you can set it false in personal.ini to use the old code for comparison. And yes, it can also be used with other sites, but I haven't tested that very extensively.

Again, at this point, it's CLI only. It works for me, but I haven't run any longer download tests yet.

Install using: pip install --extra-index-url https://testpypi.python.org/pypi --upgrade FanFicFare

Twilight666 commented 3 years ago

I installed the latest test verion (Windows 10, Python3)

I am getting an error

DEPRECATION: The OpenSSL being used by this python install (OpenSSL 1.1.0j 20 Nov 2018) does not meet the minimum supported version (>= OpenSSL 1.1.1) in order to support TLS 1.3 required by Cloudflare, You may encounter an unexpected reCaptcha or cloudflare 1020 blocks.

And when I tried pip.exe install --upgrade OpenSSL I get

ERROR: Could not find a version that satisfies the requirement OpenSSL ERROR: No matching distribution found for OpenSSL

Any ideas?

EDIT: Actually it is downloading the stories anyway so.... EDIT2: My version of Python was 3.7.2. Downloaded 3.7.8 and upgraded it. Message is gone

dakswiggin commented 3 years ago

With latest test CLI got cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version. error for two tested links - https://www.fanfiction.net/s/12515214/113/An-Essence-of-Silver-and-Steel and https://www.fanfiction.net/s/13510736/16/Spells-in-Silence Checked using wire and mobile networks and also with and without vpn. All with same result

UPDATE: Adding Chrome/Mozilla as user agent instead of default one and some minor delay between chapters solved error above for me - all backlog FF fiction was successfully downloaded.

chocolatechipcats commented 3 years ago

With latest test CLI got cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

Odd, this error indicates that there's a paid version of the scraper somewhere, but it doesn't seem to exist: https://stackoverflow.com/questions/64433684/cloudscraper-issue-cloud-flare-version-2-in-scraping-website

taskvalanche commented 3 years ago

I upgraded to the test CLI release on Debian 10 (Python 3.7.3) and with cloudscraper it appears to be working fine, I've done a couple of stories, one of which is quite long (over 100 chapters) and no issues at all. It's back to performing exactly how it used to. Looks like cloudscraper did it, at least for me.

Interesting that judging by the comments above some people are getting a full captcha. I tried those stories linked in the comment above and they worked. It looks like that may be a deliberate configuration on the part of FFnet if we can possibly determine the CloudFlare behavior is as described here: https://support.cloudflare.com/hc/en-us/articles/200170136-Understanding-Cloudflare-Challenge-Passage-Captcha-

hseg commented 3 years ago

Having trouble testing, can you upload to a branch here instead? In particular, uploaded source seems very different from HEAD, so I can't really compare the two. Github doesn't seem to be aware of another clone with all these changes, too.

chocolatechipcats commented 3 years ago

Interesting that judging by the comments above some people are getting a full captcha. I tried those stories linked in the comment above and they worked. It looks like that may be a deliberate configuration on the part of FFnet if we can possibly determine the CloudFlare behavior is as described here: https://support.cloudflare.com/hc/en-us/articles/200170136-Understanding-Cloudflare-Challenge-Passage-Captcha-

Note that "suspicious IP activity" can happen if your ISP assigns you an address previously used by a bad actor. Happened for me once, was rather hellish getting CAPTCHAs on near every site until I finally reset the modem.

darthShadow commented 3 years ago

Odd, this error indicates that there's a paid version of the scraper somewhere

It probably means a paid captcha solver instead of a paid scraper. There are options listed on the site for those who want to try it out: https://github.com/VeNoMouS/cloudscraper#3rd-party-captcha-solvers


Any chance we can get an option to select one of those and enter our credentials for them too in the off-chance that this error becomes more frequent?

JimmXinu commented 3 years ago

Re: Captchas, I was aware that that code was in cloudscraper, but I haven't given it any further thought since I wasn't seeing it. To my understanding, Cloudflare inflicts varying levels of obstruction depending on a number of factors--I suspect the site gets to set at least some of those factors.

@hseg, I haven't pushed the branch I'm working on into GitHub. Considering pip downloads it as source anyway, it's not a priority as far as I'm concerned. And the commit history is embarrassing ugly.

@darthShadow, I didn't even know 'paid captcha solver' was a thing. I will consider it if it becomes a large enough issue. Ditto using different JS engines, maybe? Not sure it matters...

For now, v3.26.2 seems to be working for most people trying it and my next priority will seeing if I can shoe-horn it into the Calibre plugin.

hseg commented 3 years ago

The testing package for my distro (Archlinux) fetches from git HEAD. I can easily retarget it to fetch from a branch elsewhere, but making it fetch from pip is annoying enough not to bother with at this hour. Am willing to accept history-instability of a testing branch. But if you don't want your dirty code out there, that's fine too.

chocolatechipcats commented 3 years ago

I also noticed this from the cloudscraper page:

It's easy to integrate cloudscraper with other applications and tools. Cloudflare uses two cookies as tokens: one to verify you made it past their challenge page and one to track your session. To bypass the challenge page, simply include both of these cookies (with the appropriate user-agent) in all HTTP requests you make. To retrieve just the cookies (as a dictionary), use cloudscraper.get_tokens(). To retrieve them as a full Cookie HTTP header, use cloudscraper.get_cookie_string(). get_tokens and get_cookie_string both accept Requests' usual keyword arguments (like get_tokens(url, proxies={"http": "socks5://localhost:9050"})). Please read Requests' documentation on request arguments for more information.

Could this be used? Token could possibly be copied from a browser session where ffnet was successfully accessed - I did something similar once with wget to get some stuff from behind a login.

MrTyton commented 3 years ago

Required the update to cloudscraper and spoofing the user agent to Mozilla/5.0 in order to work for me.

On Mon, Dec 21, 2020 at 8:52 PM chocolatechipcats notifications@github.com wrote:

I also noticed this from the cloudscraper page:

It's easy to integrate cloudscraper with other applications and tools. Cloudflare uses two cookies as tokens: one to verify you made it past their challenge page and one to track your session. To bypass the challenge page, simply include both of these cookies (with the appropriate user-agent) in all HTTP requests you make. To retrieve just the cookies (as a dictionary), use cloudscraper.get_tokens(). To retrieve them as a full Cookie HTTP header, use cloudscraper.get_cookie_string(). get_tokens and get_cookie_string both accept Requests' usual keyword arguments (like get_tokens(url, proxies={"http": "socks5://localhost:9050"})). Please read Requests' documentation on request arguments for more information.

Could this be used?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JimmXinu/FanFicFare/issues/614#issuecomment-749292535, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMW5G2ZL35KRKGXHPBS223SV73WZANCNFSM4VCKKNMQ .

JimmXinu commented 3 years ago

@chocolatechipcats, To do what? The cookies have to be obtained first. And when cloudscraper says 'with the appropriate user-agent', keep in mind that cloudscraper has several thousand user-agent strings it uses.

I'm also going to look at not explicitly setting user-agent and letting cloudscraper do it's thing with user-agent.