JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
750 stars 161 forks source link

Downloading from Web Archive through retro-proxy #873

Closed mcepl closed 2 years ago

mcepl commented 2 years ago

I am trying to download http://archiveofourown.org/works/8741551 which is perfectly available on https://web.archive.org/web/20170214224728/archiveofourown.org/works/8741551 but not available on the real AO3. When trying a proxy from either https://github.com/remino/timeprox or from https://github.com/richardg867/WaybackProxy, both of them end with this error (this is timeprox with line 42 of server.js modified to download from https://web.archive.org/web/20170214224728/${url}):

stitny~/K/f/tmp$ fanficfare -d -o http_proxy=http://127.0.0.1:3000 -o https_proxy=http://127.0.0.1:3000 http://archiveofourown.org/works/8741551
FFF: DEBUG: 2022-08-13 18:20:14,224: cli.py(230):     OS Version:Linux-5.18.15-1-default-x86_64-with-glibc2.35
FFF: DEBUG: 2022-08-13 18:20:14,224: cli.py(231): Python Version:3.10.6 (main, Aug 02 2022, 17:22:31) [GCC]
FFF: DEBUG: 2022-08-13 18:20:14,224: cli.py(232):    FFF Version:4.14.3
FFF: DEBUG: 2022-08-13 18:20:14,239: configurable.py(1044): use_browser_cache:
FFF: DEBUG: 2022-08-13 18:20:14,239: configurable.py(1058): use_basic_cache:true
FFF: INFO: 2022-08-13 18:20:14,246: adapter_archiveofourownorg.py(163): url: https://archiveofourown.org/works/8741551/navigate?view_adult=true
FFF: INFO: 2022-08-13 18:20:14,246: adapter_archiveofourownorg.py(164): metaurl: https://archiveofourown.org/works/8741551?view_adult=true
FFF: DEBUG: 2022-08-13 18:20:14,247: fetcher.py(234): 
========== MISS (GET) BasicCache
https://archiveofourown.org/works/8741551/navigate?view_adult=true
FFF: DEBUG: 2022-08-13 18:20:14,247: fetcher.py(469): 
---------- REQ (GET) RequestsFetcher
https://archiveofourown.org/works/8741551/navigate?view_adult=true
FFF: DEBUG: 2022-08-13 18:20:14,248: fetcher.py(450): Session Proxies After INI:{'http': 'http://127.0.0.1:3000', 'https': 'http://127.0.0.1:3000'}
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/urllib3/connectionpool.py", line 700, in urlopen
    self._prepare_proxy(conn)
  File "/usr/lib/python3.10/site-packages/urllib3/connectionpool.py", line 996, in _prepare_proxy
    conn.connect()
  File "/usr/lib/python3.10/site-packages/urllib3/connection.py", line 369, in connect
    self._tunnel()
  File "/usr/lib64/python3.10/http/client.py", line 920, in _tunnel
    (version, code, message) = response._read_status()
  File "/usr/lib64/python3.10/http/client.py", line 287, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/usr/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/usr/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='archiveofourown.org', port=443): Max retries exceeded with url: /works/8741551/navigate?view_adult=true (Caused by ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response')))

During handling of the above exception, another exception occurred:
stitny~/K/f/tmp$ 

(using urllib3 1.26.11 from openSUSE package with no patches on the main code)

Do you have any idea what’s going on, please?

JimmXinu commented 2 years ago

I would speculate that neither proxy supports https; both describe themselves as an "HTTP proxy".

You could try changing adapter_archiveofourownorg.py in FFF to use http instead of https. Find and replace https: to http: (but leave https?: alone) might work. I'd try the story URL with http:// with a browser through the proxy first, though.

This is very much an 'off label' use of FFF. If you get it working, great. Feel free to report that here. But I'm not going to support it in general.

mcepl commented 2 years ago

Tried, but doesn't work. I am just giving up, this story is not that important to suffer for it.