Concordium / concordium-node

The main concordium node implementation.
GNU Affero General Public License v3.0
45 stars 22 forks source link

Out-of-band catch up is aborted due to timeout #642

Closed vikt0r0 closed 1 year ago

vikt0r0 commented 1 year ago

Bug Description

When performing the out-of-band catch-up with an exported database specified by an URL, the node aborts the catch-up before completing.

This happens since a TCP stream of the file blocks.idx is opened at the beginning of the catchup, and is expected to be open until the blocks.idx transfer is completed. However, the client receives roughly 40-60 lines of the file corresponding to chunks that are downloaded and processed before receiving the next batch of lines in the stream. When processing the chunks in the batch takes longer than the request timeout, the stream times out and the out-of-band catch-up is aborted (see import_missing_blocks in cli.rs for the details).

Steps to Reproduce

Perform out-of-band catch-up on mainnet with default timeout.

Expected Result

The out-of-band catch-up should succeed when the node is capable of downloading a chunk in time not exceeding the request timeout (default is 5 minutes).

Actual Result

The out-of-band catch-up stops abruptly after processing the first batch of lines of blocks.idx received in the TCP stream.

Versions

vikt0r0 commented 1 year ago

The following program simulates the bug:

testbug.zip

abizjak commented 1 year ago

Discovered by @mh-concordium