ArchiveTeam / ArchiveBot

ArchiveBot, an IRC bot for archiving websites
http://www.archiveteam.org/index.php?title=ArchiveBot
MIT License
357 stars 72 forks source link

Ignore Littlstar video files #458

Closed JustAnotherArchivist closed 4 years ago

JustAnotherArchivist commented 4 years ago

Some of these files are gigantic, especially the ones in the S3 bucket (4K VR etc., the largest file is 620 GB as of right now), and they've caused a number of crashes recently due to filling disks.

This ignore would not cover the video files that are actually used on the site or in the embed when accessed with a browser. For example, https://littlstar.com/videos/633a96a1 and https://embed.littlstar.com/videos/633a96a1 play back video files https://littlstar.com/embed/proxy/videos/3186eccb-e9d2-4735-bc41-c71a3e2b29cb/web.mp4 and https://embed.littlstar.com/proxy/videos/3186eccb-e9d2-4735-bc41-c71a3e2b29cb/web.mp4, respectively. These are not currently extracted by wpull though.