Sanity check testground

evan-forbes commented 8 months ago

We should add sanity checks in testground to ensure that testground is not creating a bottleneck in throughput.

These sanity checks could involve deep diving into resource usage, such as:

bandwidth
cpu
ram

To ensure that something more complex is not wrong with testground, we should run similar if not identical experiments on a different backend such as knuu. Depending on the results of an experiment, the results of the experiments that we've ran so far might need to be revisited. If the results of the experiments are similar, then we are more confident in the results provided by both backends.

Acceptance Criteria

Conduct the following experiments in both knuu and testground:

Test with network sizes of 2 and 50 validators, each validator being flooded with transactions submitted by a dedicated txsim
Evaluate with block sizes of 8 MiB, 32 MiB, and 64 MiB.
with and without network latency

Assess the sanity of the testground results using the following metrics:

Block time
Block size
Consensus throughput (block size / block time)
Per node bandwidth utilization (data received and sent)

Bonus / Next

We can additionally or as the next step address https://github.com/celestiaorg/celestia-core/issues/1262.

evan-forbes commented 7 months ago

After adding more precise tracing for bandwidth, we can see that the entire allocated bandwidth for the 2 node experiment in testground is not fully being used. Roughly ~75% (~75MB/s out of the 100MB/s allocated) can be observed.

bandwidth-2-val

We should compare the tracing data seen in knuu with these results. We expect them to be similar if not identical. We should also compare the results of the same experiment with less allocated bandwidth. More information on the parameters / procedure can be found the two node follup up write up.

evan-forbes commented 5 months ago

to update this, we are starting on santiy test analysis using the two node data. Beyond the two node, we are blocked on getting knuu capable of running 100 nodes https://github.com/celestiaorg/celestia-app/issues/3488.

celestiaorg / celestia-app

Sanity check testground #3147

Acceptance Criteria

Bonus / Next