This test (and possibly others) can fail in ways that are hard to debug. For example, in testing for #1607 this test did not fail or produce a noticeable message in the 200MB log file from CI. Instead it runs forever, counting views until CI times out, like so:
2024-06-14T23:51:50.4143933Z 2024-06-14T23:51:50.414097Z WARN hotshot_task_impls::transactions: Couldn't get a block: no available blocks
2024-06-14T23:51:50.4145539Z at /home/runner/.cargo/git/checkouts/hotshot-0ac703037ea89b98/828f718/crates/task-impls/src/transactions.rs:290
2024-06-14T23:51:50.4146966Z in hotshot_task_impls::transactions::wait_for_block with id: 1, view: 1408
2024-06-14T23:51:50.4148270Z in hotshot_task_impls::transactions::Transaction task with id: 1, view: 1407
2024-06-14T23:51:50.4148937Z
2024-06-14T23:51:50.4402287Z ##[error]The action 'Test' has timed out after 40 minutes.
In this example the test submitted a transaction that triggered a bug in the sequencer whereby the new block containing the transaction got erroneously rejected, so the test just waits forever for the tx to get included. I saw no evidence of this in the logs.
Test
submit_test_with_query_module
is defined here https://github.com/EspressoSystems/espresso-sequencer/blob/e186aab116e2c7dfcd689f75c63b40379ba227d3/sequencer/src/api.rs#L784-L788 and instantiated in several places, for example here https://github.com/EspressoSystems/espresso-sequencer/blob/e186aab116e2c7dfcd689f75c63b40379ba227d3/sequencer/src/api/fs.rs#L62 asapi::fs::generic_tests::submit_test_with_query_module
.This test (and possibly others) can fail in ways that are hard to debug. For example, in testing for #1607 this test did not fail or produce a noticeable message in the 200MB log file from CI. Instead it runs forever, counting views until CI times out, like so:
In this example the test submitted a transaction that triggered a bug in the sequencer whereby the new block containing the transaction got erroneously rejected, so the test just waits forever for the tx to get included. I saw no evidence of this in the logs.
Is there an easy way to make this test fail gracefully? At a glance it seems we could tweak
wait_for_decide_on_handle
to wait for a fixed number of events before failing: https://github.com/EspressoSystems/espresso-sequencer/blob/8ce87d2803a7436ff837936ceb76a6a64aed4c54/sequencer/src/lib.rs#L835-L840If this proposal sounds good then I can take assignment of this issue.