JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
739 stars 157 forks source link

Issues with AO3 DDoS #975

Closed Twilight666 closed 12 months ago

Twilight666 commented 12 months ago

AO3 had a ddos attack. From what I see when I try to enter now, it looks like they bought a cloudflare subscription for protection, so odds are we will have to use BrowserCache like on FFnet

A couple of questions (and I know we can't verify things until AO3 is back, but I figure I should ask just in any case):

1) Is BrowserCache supported for AO3? The FAQ says

Sites that require HTTP POST for login or don't allow caching of pages will [not work with this feature]

and then at the end it says

Here are websites known to not work with the browser cache: archiveofourown.org (and possibly other OTW archives such as Squidgeworld) ...

So it looks like it is not supported for AO3.

2) AO3 allows to load the entire story, and unless by default FFF uses that to get it faster. As a result a story (this is 16 chapters and I found the urls from my mails) could be

http://archiveofourown.org/works/42224136

But chapter 15 is

http://archiveofourown.org/works/42224136/chapters/121718962

and the entire story is

https://archiveofourown.org/works/42224136?view_full_work=true

If, say I wanted to download it, would the last one be enough to get everything in the cache? Also I (usually) download chapter ranges. If I wanted to download 15 and 16 (the last 2 chapters) what url should I use. Until now http://archiveofourown.org/works/42224136[15-] and http://archiveofourown.org/works/42224136/chapters/121718962[15-] would work fine. What about now?

Considering how things are on FFnet would I have to use https://archiveofourown.org/works/42224136?view_full_work=true[15-]?

I am guessing the above are not going to be decided/solved/checked until AO3 is back. But I figure I should open this, so that you have more time to think it over

JimmXinu commented 12 months ago

I'm not spending any significant time on this until it's been longer--I'm still seeing server failure pages even after cloudflare.

What I remember off hand: AO3/OTW didn't work with browser cache during development of the feature. It wasn't needed so I didn't figure out why. AO3 login requires POST which isn't cachable.

JimmXinu commented 12 months ago

Title changed so visitors don't think that's a working solution.

inklesspen commented 12 months ago

I just want to note that using the browser cache feature with AO3 did work for me just now (tested on only one fic, though).

Twilight666 commented 12 months ago

It looks like it works for me too. I am not 100% how well it works but for a story like https://archiveofourown.org/works/45475381 I had to open 3 pages (I went on the pages FFF told me where missing from the cache)

https://archiveofourown.org/works/45475381/navigate?view_adult=true (TOC) https://archiveofourown.org/works/45475381?view_adult=true (The story URL I used... although FFF added the whole view_adult=true on it's own) https://archiveofourown.org/works/45475381?view_full_work=true&view_adult=true (The Full story)

I don't know if that can be cut down... especially since the 2nd one is worthless if you have the 3rd. Also the I use the 1st one to add the dates to my TOC, but maybe for others the 1st could be skipped.

Similarly on a story with only one chapter (https://archiveofourown.org/works/48093637) I had the following:

https://archiveofourown.org/works/48093637/navigate?view_adult=true https://archiveofourown.org/works/48093637?view_adult=true https://archiveofourown.org/works/48093637/chapters/121271518?view_adult=true (Since there is no full story)

Finally when I used open_pages_in_browser:true for https://archiveofourown.org/works/36326725 it opened a similar number of pages like above, but it looks like it works

In conclusion, it looks like BrowserCache is working properly... unless you want to fix up a couple of weird things (like the whole ?view_adult=true thing)

jcotton42 commented 12 months ago

Has anyone tested this with fics that require logging in to view?

chocolatechipcats commented 12 months ago

I copied my cache settings over from the ffnet section: use_browser_cache:true use_browser_cache_only:true open_pages_in_browser:true

Got a curious error. I'm not sure what it means.

Browser Cache Failed to Load with error ''NoneType' object has no attribute 'headers

Speaking of, depending on whether CF remains in use after the attack is over, the CF-bypassing methods may be needed later on. But they won't work in under-attack mode regardless.

inklesspen commented 12 months ago

I had to set both use_browser_cache and use_browser_cache_only to true, but it did successfully work with a fic that requires logging in.

chocolatechipcats commented 12 months ago

I copied my cache settings over from the ffnet section: use_browser_cache:true use_browser_cache_only:true open_pages_in_browser:true

Got a curious error. I'm not sure what it means.

Browser Cache Failed to Load with error ''NoneType' object has no attribute 'headers

Speaking of, depending on whether CF remains in use after the attack is over, the CF-bypassing methods may be needed later on. But they won't work in under-attack mode regardless.

It worked after I cleared the cache and re-cached the pages. 😃

mcepl commented 12 months ago

It certainly doesn’t work for me:

fun/tmp$ fanficfare -c /dev/null -o use_browser_cache=true -o use_browser_cache_only=true 'h
ttps://archiveofourown.org/works/23267857/chapters/55720960#workskin'
Traceback (most recent call last):
  File "/home/matej/.bin/fanficfare", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/cli.py", line 344, in main
    dispatch(options, urls, passed_defaultsini, passed_personalini, warn, fail)
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/cli.py", line 320, in dispatch
    do_download(url,
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/cli.py", line 435, in do_download
    adapter.getStoryMetadataOnly()
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/adapters/base_adapter.py", line 327, in getStoryMetadataOnly
    self.doExtractChapterUrlsAndMetadata(get_cover=get_cover)
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/adapters/base_adapter.py", line 431, in doExtractChapterUrlsAndMetadata
    return self.extractChapterUrlsAndMetadata()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/adapters/adapter_archiveofourownorg.py", line 166, in extractChapterUrlsAndMetadata
    data = self.get_request(url)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/requestable.py", line 119, in get_request
    return self.get_request_redirected(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/requestable.py", line 111, in get_request_redirected
    (data,rurl) = self.configuration.get_fetcher().get_request_redirected(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/base_fetcher.py", line 133, in get_request_redirected
    fetchresp = self.do_request('GET',
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/decorators.py", line 68, in fetcher_do_request
    fetchresp = chainfn(
                ^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/cache_basic.py", line 122, in fetcher_do_request
    fetchresp = chainfn(
                ^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/decorators.py", line 102, in fetcher_do_request
    fetchresp = chainfn(
                ^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/cache_browser.py", line 97, in fetcher_do_request
    raise exceptions.HTTPErrorFFF(
fanficfare.exceptions.HTTPErrorFFF: HTTP Error in FFF 'Page not found or expired in Browser Cache (see FFF setting browser_cache_age_limit)'(428) URL:'https://archiveofourown.org/works/23267857/navigate?view_adult=true'
fun/tmp$ fanficfare --version
Version: 4.24.0
fun/tmp$ 
chocolatechipcats commented 12 months ago

I'd clear the cache, visit the page listed in the error, and try again.

inklesspen commented 12 months ago

@mcepl Have you also set browser_cache_path? I think most of us are using the Calibre plugin and have set all these options in the ini file.

mcepl commented 12 months ago

Yup, /home/matej/.var/app/org.mozilla.firefox/cache/mozilla/firefox/g8r0rybx.default-release/cache2 from about:cache (Firefox 115.0.1 from Flatpak on Linux/openSUSE).

inklesspen commented 12 months ago

You also have to either set open_pages_in_browser true (and use the cache from your system default browser), or else manually visit all the necessary pages in the browser…

chocolatechipcats commented 12 months ago

If using open_pages_in_browser, would suggest stepping awayb from the computer for a bit. It's a little intrusive, lol.

pouppe55 commented 12 months ago

@mcepl Have you also set browser_cache_path? I think most of us are using the Calibre plugin and have set all these options in the ini file. Excuse me, could you tell me, how is it done? I already modified the section of [part of use_browser_cache: true use_browser_cache_only: true open_pages_in_browser: true ](section: use_browser_cache:true use_browser_cache_only:true open_pages_in_browser:true) but it keeps giving me an error, it does not recognize titles or authors

inklesspen commented 12 months ago

Here are the relevant settings from my personal.ini file:

[defaults]
browser_cache_path:/Users/rose/Library/Caches/Google/Chrome/Default/Cache/Cache_Data
browser_cache_age_limit:4.0
open_pages_in_browser:true

[archiveofourown.org]
username:XXXXXXXX
password:XXXXXXXX
is_adult:true
use_browser_cache:true
use_browser_cache_only:true
Twilight666 commented 12 months ago

Found a weird one. I downloaded a few stories, searching for possible errors and found this one: http://archiveofourown.org/works/48429163

The issue is that the author had some embeded pictures. And it looks like open_pages_in_browser:true keeps re-opening them.

The story is downloaded fine. open_pages_in_browser:true opened the other 3 pages that I describe in my previous comment. But if I retry, it re-opens the images. Just the images

JimmXinu commented 12 months ago

Login through FFF is not going to work. It uses an HTTP POST, which are never cached. However, if you are using open_pages_in_browser and logged in, you might not need to.

I haven't tested it in detail and I don't plan to until the situation has stabilized more.

I will also say the same thing I did on MR:

I would suggest not trying too much with FFF right now. We don't want to be categorized as part of the problem.

Lagicrus commented 12 months ago

I just tried without any bypass, proxy, cache, etc, for AO3, and got it the first time with 0 issues. So looks like they might no longer have CF set on "I'm under attack mode"?

mcepl commented 12 months ago

Plain FanFicFare now works with A03.

HappyFaceSpider commented 12 months ago

Does it work? Because I keep getting "HTTP Error in FFF '403 Client Error: Forbidden for url: " while trying to use "plain" FFF with ao3.

kyoam commented 12 months ago

fff was working for a while then it stopped

this showed up on trying to update an anthology from ao3 using fff i hope it can help (because i was using fff to update anthology series and having fff not able to do that is going to be a pain in the neck) calibre, version 6.22.0 ERROR: Unhandled exception: HTTPErrorFFF:HTTP Error in FFF '403 Client Error: Forbidden for url: https://archiveofourown.org/series/2998248'(403)

calibre 6.22 embedded-python: True Windows-10-10.0.19045-SP0 Windows ('64bit', 'WindowsPE') ('Windows', '10', '10.0.19045') Python 3.10.1 Windows: ('10', '10.0.19045', 'SP0', 'Multiprocessor Free') Interface language: None Successfully initialized third party plugins: EpubMerge (2, 15, 0) && FanFicFare (4, 25, 0) Traceback (most recent call last): File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\fanficfare\fetchers\fetcher_requests.py", line 128, in request File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\requests\models.py", line 943, in raise_for_status requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://archiveofourown.org/series/2998248

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "calibre_plugins.fanficfare_plugin.fff_plugin", line 944, in update_anthology File "calibre_plugins.fanficfare_plugin.fff_plugin", line 686, in get_urls_from_page File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\fanficfare\geturls.py", line 44, in get_urls_from_page File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\fanficfare\adapters\base_adapter.py", line 462, in get_urls_from_page File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\fanficfare\requestable.py", line 119, in get_request File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\fanficfare\requestable.py", line 111, in get_request_redirected File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\fanficfare\fetchers\base_fetcher.py", line 133, in get_request_redirected File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\fanficfare\fetchers\cache_basic.py", line 122, in fetcher_do_request File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\fanficfare\fetchers\decorators.py", line 102, in fetcher_do_request File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\fanficfare\fetchers\base_fetcher.py", line 106, in do_request File "C:\Users\Owner\AppData\Roaming\calibre\plugins\FanFicFare.zip\fanficfare\fetchers\fetcher_requests.py", line 149, in request fanficfare.exceptions.HTTPErrorFFF: HTTP Error in FFF '403 Client Error: Forbidden for url: https://archiveofourown.org/series/2998248'(403)

JimmXinu commented 12 months ago

Since copy/paste is easy, I'll repeat another post I just made over on MR:

I reiterate: I'm not putting significant effort into finding/creating/documenting solutions for AO3 on FFF until the situation has stabilized. I expect that to be a matter of days, not hours.

My personal opinion (based only on what I've seen publicly) is that donation funded AO3 is much more likely to return to a point downloaders can work than ad funded ffnet was.

But obviously it's not going to be a priority when basic services are still being impacted.

I urge restraint for all users--now is not the time to being hammering the AO3 servers.

kyoam commented 12 months ago

no problem but i thought the data might be helpful at that point. i would have forgotten about it before then.

HappyFaceSpider commented 12 months ago

I have no issue with waiting, I was just surprised that most recent posts claimed it worked when it didn't for me. I thought that maybe I'm doing something wrong. :)

tag0 commented 12 months ago

You also have to either set open_pages_in_browser true (and use the cache from your system default browser), or else manually visit all the necessary pages in the browser…

Worked for me (after clearing the cache, and making sure all the use browser details were set to true). Nuisancy in that it opens two copies of the story in my browser... but better that than not being able to pull them directly into calibre at all!

MrTyton commented 12 months ago

It was not working a bunch, and then it started working again (unsure why, couldn't get the errors since it's automated). Could sending the requests through flaresolverr, similar to the config for ffnet, help with this? Is that supported right now for any site or is it only for ffnet?

chocolatechipcats commented 12 months ago

Since copy/paste is easy, I'll repeat another post I just made over on MR:

I reiterate: I'm not putting significant effort into finding/creating/documenting solutions for AO3 on FFF until the situation has stabilized. I expect that to be a matter of days, not hours.

My personal opinion (based only on what I've seen publicly) is that donation funded AO3 is much more likely to return to a point downloaders can work than ad funded ffnet was.

But obviously it's not going to be a priority when basic services are still being impacted.

I urge restraint for all users--now is not the time to being hammering the AO3 servers.

Additionally, an AO3 support member on Reddit said that they're "getting Cloudflare used to what our 'good' traffic looks like," so I would expect it to be a bit unreliable for few days. (For another example, the RSS feed came back up for a bit, then died again.)

chocolatechipcats commented 12 months ago

and then at the end it says

Here are websites known to not work with the browser cache: archiveofourown.org (and possibly other OTW archives such as Squidgeworld) ...

I was the one 'maintaining' the "sites known not to work" section (well, mostly just adding new entries as I learned about them), so have updated it with new information.

kyoam commented 12 months ago

okay was able to get about 42 fics on ao3 updated early this morning about 6am edt. no changes or workaround done to fff on the personal ini. for ao3 going out it may be more when to update than editing how fff updates from ao3

JimmXinu commented 12 months ago

Working normally again now for me.

Anybody still failing?

kyoam commented 12 months ago

haven't had problems as of yet today. hopefully that will still be the case later.

Twilight666 commented 12 months ago

They had switched to cloudflare again when I tried around 6 hours ago. I switched to browsercache again and it worked

chocolatechipcats commented 12 months ago

They had switched to cloudflare again when I tried around 6 hours ago. I switched to browsercache again and it worked

I posted on MR about it. Not going to try to copy the formatting, so here's a link: https://www.mobileread.com/forums/showpost.php?p=4339543&postcount=8644

JimmXinu commented 12 months ago

Everything I've seen in last day or two says we're back to normal.