ArchiveTeam / ArchiveBot

ArchiveBot, an IRC bot for archiving websites
http://www.archiveteam.org/index.php?title=ArchiveBot
MIT License
357 stars 71 forks source link

Domain/pattern-based user agent overrides #463

Open JustAnotherArchivist opened 4 years ago

JustAnotherArchivist commented 4 years ago

While annoying, this can be handled for jobs explicitly targeting those sites using --useragent. However, outlinks from other sites might be screwed. I propose adding a mechanism that overrides the user agent on a per-request basis. If the request matches a regular expression, the UA is overridden for that request. As an optimisation, hostname matching would also be possible to skip expensive regex matching, possibly also with wildcards for any subdomain. This may require changes in wpull.