ArchiveTeam / ArchiveBot

ArchiveBot, an IRC bot for archiving websites
http://www.archiveteam.org/index.php?title=ArchiveBot
MIT License
357 stars 71 forks source link

SSL/TLS mismatches may cause retrieval to fail #424

Closed JustAnotherArchivist closed 4 years ago

JustAnotherArchivist commented 5 years ago

Different pipelines use different SSL libraries with different configurations. While this is usually not a problem, it can render some web servers incompatible with some pipelines. Most commonly, this produces an error [Errno 1] Operation not permitted.

For example, job 8x2mrrmgydi9f5y32y6j0iahv for https://drivemode.com/ failed with that error on pipeline:32a4f537eb76a08dd989f6a5193fa459 and pipeline:8d776e7deadde6cb66ff9ffeeb343baf (both based on Debian Buster) with curl producing * error:1414D172:SSL routines:tls12_check_peer_sigalg:wrong signature type in this particular case. Job bt7292w39o6plzbkogis6ivz2 seems to have the same issue right now on retrieving from https://www.hkpl.gov.hk/, though I haven't verified that.

Ideally, we'd want to support all SSL/TLS configurations – even insecure ones. This might require changes in wpull, and it might also require a certain version or build of the SSL/TLS library, e.g. to still support SSLv2 (which is commonly disabled in the standard builds nowadays). Of course, ideally, all those details would be recorded in the WARC as well, but that's a separate issue.