IntersectMBO / cardano-node

The core component that is used to participate in a Cardano decentralised blockchain.
https://cardano.org
Apache License 2.0
3.05k stars 721 forks source link

[BUG] - cardano-testnet sometimes hangs indefinitely #5762

Open carbolymer opened 5 months ago

carbolymer commented 5 months ago

Internal/External Internal if an IOHK staff member.

Area

Other Any other topic (Delegation, Ranking, ...).

Summary Sometimes a cardano-testnet test suite hangs indefinitely. It's like nodes are taking longer time to produce blocks. It may be related to what @james-iohk described here https://github.com/IntersectMBO/cardano-node/pull/5679#issuecomment-1959419678

The issue is more visible in slower machines, like macos runner in GHA or darwin cross-compilation in Hydra.

Steps to reproduce Steps to reproduce the behavior:

  1. Rerun the cardano-test suite multiple times, some of the tests should either get stuck or fail on a condition check in byDeadlineM.

The issue appears to appear more frequently when running testnet test suites in parallel.

[!NOTE] Testnet tests can be executed in parallel using PARALLEL_TESTNETS=1 environment variable or by setting --test-options '--num-threads 8' in cabal test cardano-testnet execution (after that PR gets merged).

Sample log of a failure: babbagetransaction.txt (taken from: https://github.com/IntersectMBO/cardano-node/pull/5695/checks?check_run_id=22357754517)

Expected behavior cardano-testnet does not hang, or retries, reports the failure with message explaining what happened.

carbolymer commented 5 months ago

Initially, byDeadlineM usage was considered an issue here, which was partially removed in https://github.com/IntersectMBO/cardano-node/pull/5707 - but instead of test failures we started getting cardano-tesnet freezes. A suspicion here is that the test network is not advancing - the new blocks are not produced.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.