Open tbro opened 3 months ago
A related discussion on zulip to improve the readability and reliability of slow_dev_node_multiple_lc_providers_test()
function:
I'd update this part of the logic with heavy code comment and modify it like this:
for AltChainInfo {
provider_url,
light_client_address,
chain_id,
..
} in dev_info.alt_chains
{
tracing::info!("checking hotshot commitment for {chain_id}");
let signer = init_signer(&provider_url, TEST_MNEMONIC, 0).await.unwrap();
let light_client = LightClient::new(light_client_address, Arc::new(signer.clone()));
// Light client prover are running and updating the `newFinalizedState()` in the light client contract
// the next call ensure those updates are accessible in a sliding window of historical HotShot blocks
while light_client
.get_hot_shot_commitment(U256::from(1))
.call()
.await
.is_err()
{
tracing::info!("waiting for commitment");
sleep(Duration::from_secs(3)).await;
}
let liveness_failure_height = signer.get_block_number().await.unwrap().as_u64();
let (_, l1_height_of_last_hotshot_block) = light_client
.state_history_commitments(light_client.get_state_history_count().await? - 1)
.await?;
// *Simulate* a hotshot liveness failure: by toggling the flag in mock light client contract;
// under the hood, both L1 and Hotshot are progressing: `stateHistoryCommitments` in the contract
// is appended with new Hotshot block commitment and new L1 block height;
// BUT, `lag_over_escape_hatch_threshold()` will compute against the frozen `l1_height_of_last_hotshot_block`
dev_node_client
.post::<()>("api/set-hotshot-down")
.body_json(&SetHotshotDownReqBody {
chain_id: Some(chain_id),
height: liveness_failure_height,
})
.unwrap()
.send()
.await
.unwrap();
// sanity check
assert!(liveness_failure_height >= l1_height_of_last_hotshot_block);
assert!(
!light_client
.lag_over_escape_hatch_threshold(
U256::from(liveness_failure_height + 1),
U256::from(
liveness_failure_height - l1_height_of_last_hotshot_block
),
)
.call()
.await?
);
assert!(
light_client
.lag_over_escape_hatch_threshold(
U256::from(liveness_failure_height),
U256::from(liveness_failure_height - l1_height_of_last_hotshot_block),
)
.call()
.await?
);
// to detect hotshot is down, we test that L1 made progress, but light client contract didn't
// the while-loop condition will evaluate to false when L1 height increase beyond `liveness_failure_height`
//TODO: maybe send dummy tx to artificially increase L1 block here; otherwise we are waiting for light client prover
// to generate proofs which takes 2 min in CI.
while !light_client
.lag_over_escape_hatch_threshold(
U256::from(signer.get_block_number().await?), // current L1 block height
U256::from(liveness_failure_height - l1_height_of_last_hotshot_block),
)
.call()
.await
.unwrap_or(false)
{
tracing::info!("waiting for setting hotshot down");
sleep(Duration::from_secs(3)).await;
}
// *Simulate* Hotshot regaining liveness by toggling the flag in mocked light client contract.
// During all steps above, the newFinalizedState are being updated due to the simulation.
dev_node_client
.post::<()>("api/set-hotshot-up")
.body_json(&SetHotshotUpReqBody { chain_id })
.unwrap()
.send()
.await
.unwrap();
// Detect hotshot restoring liveness by ensuring the gap between `liveness_failure_height` and
// (the back online, thus updated) `l1_height_of_last_hotshot_block` decreased.
// Note that in the sanity check above when we shutdown hotshot, the same statement in while-loop condition equals true;
// only when light client got new updates, will `lagOver()` returns false
while light_client
.lag_over_escape_hatch_threshold(
U256::from(liveness_failure_height),
U256::from(liveness_failure_height - l1_height_of_last_hotshot_block),
)
.call()
.await
.unwrap_or(true)
{
tracing::info!("waiting for setting hotshot up");
sleep(Duration::from_secs(3)).await;
}
}
then modify LightClientMock.sol
:
function setHotShotDown() public {
hotShotDown = true;
frozenL1Height = stateHistoryCommitments[stateHistoryCommitments.length - 1].l1BlockHeight;
}
^^ I have made minor changes to the suggested snippet based on @alysiahuggins's point on a potential underflow in my original post in zulip. But again, this code has not been tested, might need some small tweaking at least
cc @imabdulbasit @ImJeremyHe
Review the tests identified as slow, and see if there is a way to lower their runtime.
You can see which ones are slow in CI. For example: https://github.com/EspressoSystems/espresso-sequencer/actions/runs/10387105099/job/28759858781