ArchiveTeam / ArchiveBot

ArchiveBot, an IRC bot for archiving websites
http://www.archiveteam.org/index.php?title=ArchiveBot
MIT License
357 stars 72 forks source link

Support cookie-jar manipulation #101

Open hannahwhy opened 10 years ago

hannahwhy commented 10 years ago

On Reddit, there are many subreddits that have an age gate. Clicking "Yes" sets an "over18=1" cookie for reddit.com. ArchiveBot cannot currently click buttons (even in PhantomJS mode), and so is blocked from archiving any subreddits that have age gates.

Reddit is not the only site that does this; other sites have similar cookie-based checks. We should be able to specify cookies up front. Perhaps a syntax like this:

!a http://cookies.example.com/ --cookie=.example.com:foo=1

which would add a nonsecure cookie with value foo=1 for .example.com.

More extensive customization is possible:

!a http://cookies.example.com/ --cookie-jar=https://gist.github.com/abcdefgh

This would download a cookie jar file from https://gist.github.com/abcdefgh and pass it to wpull. It is expected that the cookie jar would be usable by wpull.

hannahwhy commented 10 years ago

Another possibility, one that's less flexible but probably more useful, is to roll this into a more general site-specific customizations framework that's triggered on demand for given sites:

!a http://www.reddit.com/r/oculusnsfw/ --site-specific-settings  # or --sss I guess

(obviously, that link is NSFW)

This would also be a way to address #80.

hannahwhy commented 10 years ago

The immediate use case has been solved by https://github.com/ArchiveTeam/ArchiveBot/tree/reddit-over18.

chfoo commented 9 years ago

Just some ideas for on-the-fly commands: