ArchiveTeam / ArchiveBot

ArchiveBot, an IRC bot for archiving websites
http://www.archiveteam.org/index.php?title=ArchiveBot
MIT License
352 stars 72 forks source link

Dashboard WebSocket server crashing with `asyncio.streams.LimitOverrunError` #549

Open JustAnotherArchivist opened 1 year ago

JustAnotherArchivist commented 1 year ago

This crash happened twice today:

Traceback (most recent call last):
  File ".../python3.6/asyncio/streams.py", line 488, in readline
    line = yield from self.readuntil(sep)
  File ".../python3.6/asyncio/streams.py", line 569, in readuntil
    offset)
asyncio.streams.LimitOverrunError: Separator is not found, and chunk exceed the limit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dashboard/websocket.py", line 85, in <module>
    main()
  File "dashboard/websocket.py", line 81, in main
    loop.run_until_complete(asyncio.gather(stdin_to_amplifier(amplifier, loop), print_status(amplifier)))
  File ".../python3.6/asyncio/base_events.py", line 484, in run_until_complete
    return future.result()
  File "dashboard/websocket.py", line 28, in stdin_to_amplifier
    amplifier.send((await reader.readline()).decode('utf-8').strip())
  File ".../python3.6/asyncio/streams.py", line 497, in readline
    raise ValueError(e.args[0])
ValueError: Separator is not found, and chunk exceed the limit

Sounds like overlong lines from the firehose cause this crash, potentially a job with very long URLs.

JustAnotherArchivist commented 1 year ago

That is indeed what causes these crashes. One job in particular produced lines of up to 1.7 MiB. The buffer is only 1 MiB. The fix here is probably to drop lines that exceed some limit. Whether that should be 1 MiB or larger, I'm not sure, but really that size ought to be sufficient.