ArchiveTeam / ArchiveBot

ArchiveBot, an IRC bot for archiving websites
http://www.archiveteam.org/index.php?title=ArchiveBot
MIT License
357 stars 72 forks source link

Random hangs on Akamai-fronted websites #564

Open JustAnotherArchivist opened 10 months ago

JustAnotherArchivist commented 10 months ago

Since sometime recently, various websites frequently cause 6-hour stalls. It's unclear when this started exactly, but it became very noticeable (i.e. multiple jobs hanging every few hours) in October, I think. They have in common that they're all fronted by Akamai's CDN, i.e. they're all CNAME something.edgekey.net.. Some examples:

Not all requests on all Akamai-using websites or even all of these listed domains are stalling. Sometimes, connections time out after a long time (15-20 minutes) instead. There may be some relation to the request headers; in one case (investors.biontech.de), it appeared that not including a Connection header would cause the stalls when using HTTP/1.1, but that may not be the whole explanation.