ArchiveTeam / ArchiveBot

ArchiveBot, an IRC bot for archiving websites
http://www.archiveteam.org/index.php?title=ArchiveBot
MIT License
352 stars 72 forks source link

Add support for Amazon S3 Bucket list .xml files #527

Closed upintheairsheep closed 9 months ago

upintheairsheep commented 2 years ago

http://cds.p6v7v3q9.hwcdn.net/

JustAnotherArchivist commented 2 years ago

That's an S3-style bucket listing, not really a sitemap. It's not supported by ArchiveBot or wpull.

upintheairsheep commented 2 years ago

Can you add support for it or at least make a script to turn this into a list of urls?

upintheairsheep commented 2 years ago

https://github.com/scottcorgan/bucket-list

JustAnotherArchivist commented 2 years ago

https://gitea.arpa.li/JustAnotherArchivist/little-things/src/branch/master/s3-bucket-list