ArchiveTeam / ArchiveBot

ArchiveBot, an IRC bot for archiving websites
http://www.archiveteam.org/index.php?title=ArchiveBot
MIT License
350 stars 72 forks source link

Bypass Cloudflare #216

Open Sanqui opened 8 years ago

Sanqui commented 8 years ago

Cloudflare blocks ArchiveBot at the moment.

It doesn't seem like it's difficult to bypass. https://github.com/Anorov/cloudflare-scrape claims to be able to simply grab a cookie, which could then be used with a regular scrape. The process could be activated by passing --cloudflare.

Could possibly share some code with #101.

hannahwhy commented 8 years ago

cloudflare-scrape looks fine. We'll need to get (probably) node.js on all the pipelines that want to use it, but that shouldn't be a problem.