Open Melonadev opened 4 years ago
Interesting error! I've never seen this one before.
I would guess it's a single URL from your list that has some weird script redirection, or something else like that. I'll rerun with logging and see what happens.
I haven't been able to reproduce this. Can you upgrade to the newest version of archiver (1.9.0) and run:
archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG > out.log 2>&1
That's what it would be on Linux, not sure on Windows. The > out.log 2>&1
is just saving the debug and error log to a file, but you can leave those out and copy/paste the output here as well.
EDIT: This is still for version 1.8.1. I have since updated archiver to 1.9.0 and tried it again.
It appears to run successfully without errors, but the last entry of the log file says otherwise:
DEBUG:root:Arguments: Namespace(archive_sitemap=False, file='./fest.txt', jobs=1, log_file=None, log_level='DEBUG', rate_limit_in_sec=30, sitemaps=[], urls=[])
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
INFO:root:Parsing sitemaps
INFO:root:Reading urls from file: ./fest.txt
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/wellness
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/hub
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2019-2
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-lineup
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/personnel
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/videos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/photos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/past-shows
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-10th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-saturday-jan-11th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-17th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/marathon-map
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/archive
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/about
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/sponsorship
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/contact
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2018
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/jazz-for-kids
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year
DEBUG:root:Archive URLs: {'https://web.archive.org/save/https://www.winterjazzfest.com/2020-saturday-jan-11th', 'https://web.archive.org/save/https://www.winterjazzfest.com/jazz-for-kids', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-17th', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-10th', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018', 'https://web.archive.org/save/https://www.winterjazzfest.com/contact', 'https://web.archive.org/save/https://www.winterjazzfest.com/past-shows', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2019-2', 'https://web.archive.org/save/https://www.winterjazzfest.com/wellness', 'https://web.archive.org/save/https://www.winterjazzfest.com/hub', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj', 'https://web.archive.org/save/https://www.winterjazzfest.com/personnel', 'https://web.archive.org/save/https://www.winterjazzfest.com/', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest', 'https://web.archive.org/save/https://www.winterjazzfest.com/about', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon', 'https://web.archive.org/save/https://www.winterjazzfest.com/archive', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-lineup', 'https://web.archive.org/save/https://www.winterjazzfest.com/photos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes', 'https://web.archive.org/save/https://www.winterjazzfest.com/sponsorship', 'https://web.archive.org/save/https://www.winterjazzfest.com/marathon-map', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation', 'https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2020'}
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020
Traceback (most recent call last):
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
r.raise_for_status()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
r.raise_for_status()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 243, in main
pool.map(partial_call, archive_urls)
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
raise self._value
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020
Can you make sure you're running 1.9.0
? I added more logging that prints the version number, and I don't see it in the above log. 1.9.0
should fix the 520 issue (or at least if they don't show up more than 5 times in a row).
Run:
archiver --version
And when you're running archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG
make sure it says "Version 1.9.0" at the top of the log.
I updated archiver to 1.9.0 and used archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG
. It's been like this for more than 6 hours and it doesn't seem to have finished:
C:\Users\yewhe\Downloads\archiver files>archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG
DEBUG:root:Archiver Version: 1.9.0
DEBUG:root:Arguments: Namespace(archive_sitemap=False, file='./fest.txt', jobs=1, log_file=None, log_level='DEBUG', rate_limit_in_sec=30, sitemaps=[], urls=[])
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
INFO:root:Parsing sitemaps
INFO:root:Reading urls from file: ./fest.txt
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/wellness
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/hub
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2019-2
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-lineup
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/personnel
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/videos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/photos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/past-shows
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-10th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-saturday-jan-11th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-17th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/marathon-map
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/archive
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/about
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/sponsorship
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/contact
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2018
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/jazz-for-kids
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year
DEBUG:root:Archive URLs: {'https://web.archive.org/save/https://www.winterjazzfest.com/archive', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon', 'https://web.archive.org/save/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes', 'https://web.archive.org/save/https://www.winterjazzfest.com/', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-17th', 'https://web.archive.org/save/https://www.winterjazzfest.com/past-shows', 'https://web.archive.org/save/https://www.winterjazzfest.com/personnel', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-saturday-jan-11th', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-lineup', 'https://web.archive.org/save/https://www.winterjazzfest.com/sponsorship', 'https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th', 'https://web.archive.org/save/https://www.winterjazzfest.com/contact', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2019-2', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax', 'https://web.archive.org/save/https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation', 'https://web.archive.org/save/https://www.winterjazzfest.com/photos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj', 'https://web.archive.org/save/https://www.winterjazzfest.com/hub', 'https://web.archive.org/save/https://www.winterjazzfest.com/jazz-for-kids', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-10th', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest', 'https://web.archive.org/save/https://www.winterjazzfest.com/about', 'https://web.archive.org/save/https://www.winterjazzfest.com/wellness', 'https://web.archive.org/save/https://www.winterjazzfest.com/marathon-map', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out'}
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/
Traceback (most recent call last):
File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
r.raise_for_status()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/
:-/
I haven't been able to reproduce the "Too Many Redirects" issue, and I haven't got a 520 error since updating the retry logic to cover 520s.
I tried again and this time it seems to be a 520 error. Strange.
Microsoft Windows [Version 10.0.18363.1139]
(c) 2019 Microsoft Corporation. All rights reserved.
C:\Users\yewhe\Downloads\archiver files>archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG
DEBUG:root:Archiver Version: 1.9.0
DEBUG:root:Arguments: Namespace(archive_sitemap=False, file='./fest.txt', jobs=1, log_file=None, log_level='DEBUG', rate_limit_in_sec=30, sitemaps=[], urls=[])
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
DEBUG:requests.packages.urllib3.util.retry:Converted retries value: Retry(total=5, connect=None, read=None, redirect=None, status=None) -> Retry(total=Retry(total=5, connect=None, read=None, redirect=None, status=None), connect=None, read=None, redirect=None, status=None)
INFO:root:Parsing sitemaps
INFO:root:Reading urls from file: ./fest.txt
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/wellness
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/hub
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2019-2
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-lineup
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/personnel
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/videos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/photos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/past-shows
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-10th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-saturday-jan-11th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-17th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/marathon-map
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/archive
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/about
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/sponsorship
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/contact
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2018
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/jazz-for-kids
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year
DEBUG:root:Archive URLs: {'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest', 'https://web.archive.org/save/https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation', 'https://web.archive.org/save/https://www.winterjazzfest.com/photos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added', 'https://web.archive.org/save/https://www.winterjazzfest.com/past-shows', 'https://web.archive.org/save/https://www.winterjazzfest.com/about', 'https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-saturday-jan-11th', 'https://web.archive.org/save/https://www.winterjazzfest.com/archive', 'https://web.archive.org/save/https://www.winterjazzfest.com/hub', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2019-2', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th', 'https://web.archive.org/save/https://www.winterjazzfest.com/personnel', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018', 'https://web.archive.org/save/https://www.winterjazzfest.com/', 'https://web.archive.org/save/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-17th', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-10th', 'https://web.archive.org/save/https://www.winterjazzfest.com/jazz-for-kids', 'https://web.archive.org/save/https://www.winterjazzfest.com/contact', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program', 'https://web.archive.org/save/https://www.winterjazzfest.com/wellness', 'https://web.archive.org/save/https://www.winterjazzfest.com/marathon-map', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd', 'https://web.archive.org/save/https://www.winterjazzfest.com/sponsorship', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-lineup'}
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018
Traceback (most recent call last):
File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
r.raise_for_status()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
Traceback (most recent call last):
File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
r.raise_for_status()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
r.raise_for_status()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 244, in main
pool.map(partial_call, archive_urls)
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
raise self._value
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018
Also, archiving with an xml sitemap also gives the 'Exceeded 30 redirects' error after about 30-40 minutes: I converted the xml file to txt file because Github doesn't support xml: ragina.txt
C:\Users\yewhe\Downloads\archiver files>archiver --sitemaps file://ragina.xml --rate-limit-wait 30
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 35, in call_archiver
r = session.head(request_url, allow_redirects=True)
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 553, in head
return self.request('HEAD', url, **kwargs)
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 518, in request
resp = self.send(prep, **send_kwargs)
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 661, in send
history = [resp for resp in gen] if allow_redirects else []
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 661, in <listcomp>
history = [resp for resp in gen] if allow_redirects else []
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 137, in resolve_redirects
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 244, in main
pool.map(partial_call, archive_urls)
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
raise self._value
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
The XML sitemap vs list of URLs shouldn't make a difference, they're both processed offline (and with good test coverage) to the same format internally and then pass through the same logic. I'll give this list a try.
Do you know which URL caused the redirect error?
For the xml one (ragina.txt in my previous comment), no idea. This time it doesn't say which url, unlike the txt one. EDIT: see my comment right below (nyman.txt)
BUT this time it did display the problematic link for this file (also converted to txt because Github): nyman.txt
Microsoft Windows [Version 10.0.18363.1171] (c) 2019 Microsoft Corporation. All rights reserved.
C:\Users\yewhe\Downloads\archiver files>archiver --sitemaps file://nyman.xml --rate-limit-wait 30
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/http://www.michaelnyman.com/shop/soundtracks
Traceback (most recent call last):
File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
r.raise_for_status()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/http://www.michaelnyman.com/shop/soundtracks
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 35, in call_archiver
r = session.head(request_url, allow_redirects=True)
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 553, in head
return self.request('HEAD', url, **kwargs)
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 518, in request
resp = self.send(prep, **send_kwargs)
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 661, in send
history = [resp for resp in gen] if allow_redirects else []
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 661, in <listcomp>
history = [resp for resp in gen] if allow_redirects else []
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\sessions.py", line 137, in resolve_redirects
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
File "C:\Users\yewhe\AppData\Roaming\Python\Python38\site-packages\wayback_machine_archiver\archiver.py", line 244, in main
pool.map(partial_call, archive_urls)
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
raise self._value
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
@Melonadev I've pushed v1.9.1
which increase the redirect limit from 30 to 100.
I don't expect this to fix the issue (if you have 30, I suspect it is actually an infinite redirect), but I don't know what else to try!
So, interesting that you noticed a failure with a higher wait time in #21 ...
Normally these infinite redirects are because the site is using a cookie to know that it has already redirected you and stop. I wonder if, for long wait times, the cookie goes invalid and so the loop doesn't break.
I'll see if I can find some time over the holiday to try that out.
This could be an issue with Wayback Machine itself and not your archiver, but I'm not sure.
Something strange that has been occurring for the past few days: Any of my attempts to save pages directly to the Wayback Machine website using its online form returns 'Job failed.'
To be clear, it's not the fault of archiver, but a rather troubling issue with Wayback Machine itself. This occurs on Microsoft Edge 87.0.664.47 and Firefox 83.0, both of which are already up-to-date.
I too have been having issues, but through my scheduled runs of my script:
š¤· Let me know if you figure anything out!
I have contacted the Internet Archive team about this issue: info@archive.org Hopefully they can provide an explanation or fix it promptly.
The online form's working now!
Glad to hear it!
I am running in this or similar issue consistently now. Not able to use the api to save urls. It was working, but suddenly running into this issue!
Name: waybackpy
Version: 3.0.6
Summary: Python package that interfaces with the Internet Archive's Wayback Machine APIs. Archive pages and retrieve archived pages easily.
Home-page: https://akamhy.github.io/waybackpy/
Author: Akash Mahanty
Author-email: akamhy@yahoo.com
License: MIT
Location: /home/nat/.local/lib/python3.8/site-packages
Requires: click, requests, urllib3
Required-by:
Note: you may need to restart the kernel to use updated packages.
<ipython-input-4-9a60b9f17f1c> in save_url(save_url)
2 user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
3 save_api = WaybackMachineSaveAPI(save_url)
----> 4 save_api.save()
~/.local/lib/python3.8/site-packages/waybackpy/save_api.py in save(self)
208 self.sleep(tries)
209
--> 210 self.get_save_request_headers()
211 self.saved_archive = self.archive_url_parser()
212
~/.local/lib/python3.8/site-packages/waybackpy/save_api.py in get_save_request_headers(self)
87 )
88 session.mount("https://", HTTPAdapter(max_retries=retries))
---> 89 self.response = session.get(self.request_url, headers=self.request_headers)
90 # requests.response.headers is requests.structures.CaseInsensitiveDict
91 self.headers = self.response.headers
~/.local/lib/python3.8/site-packages/requests/sessions.py in get(self, url, **kwargs)
540
541 kwargs.setdefault('allow_redirects', True)
--> 542 return self.request('GET', url, **kwargs)
543
544 def options(self, url, **kwargs):
~/.local/lib/python3.8/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
527 }
528 send_kwargs.update(settings)
--> 529 resp = self.send(prep, **send_kwargs)
530
531 return resp
~/.local/lib/python3.8/site-packages/requests/sessions.py in send(self, request, **kwargs)
665 # Redirect resolving generator.
666 gen = self.resolve_redirects(r, request, **kwargs)
--> 667 history = [resp for resp in gen]
668 else:
669 history = []
~/.local/lib/python3.8/site-packages/requests/sessions.py in <listcomp>(.0)
665 # Redirect resolving generator.
666 gen = self.resolve_redirects(r, request, **kwargs)
--> 667 history = [resp for resp in gen]
668 else:
669 history = []
~/.local/lib/python3.8/site-packages/requests/sessions.py in resolve_redirects(self, resp, req, stream, timeout, verify, cert, proxies, yield_requests, **adapter_kwargs)
164
165 if len(resp.history) >= self.max_redirects:
--> 166 raise TooManyRedirects('Exceeded {} redirects.'.format(self.max_redirects), response=resp)
167
168 # Release the connection back into the pool.
TooManyRedirects: Exceeded 30 redirects.
failed_links
I followed your suggestion on the Server Error issue:
I tried 30 and this happens: