Proposal: Cross-Implementation Benchmarking Dataset for Plutus Performance

I'm working on a C++ implementation of Plutus aimed at optimizing batch synchronization. We'd like to benchmark our implementation against existing open-source Plutus implementations to foster cross-learning and understand their relative performance. This issue is a request for feedback on the proposed benchmark dataset, as well as for approved code samples representing your implementation to include in our benchmarks. Detailed information is provided below.

The proposed benchmark dataset is driven by the following considerations:

Predictive Power: Benchmark results should allow us to predict the time required for a given implementation to validate all script witnesses on Cardano’s mainnet.
Efficient Runtime: The benchmark should complete quickly to enable rapid experimentation and performance evaluation.
Parallelization Awareness: It must assess both single-threaded and multi-threaded performance to identify implementation approaches that influence the parallel efficiency of script witness validation.
Sufficient Sample Size: The dataset should contain enough samples to allow computing reasonable sub-splits for further analysis, such as by Plutus version or by Cardano era.

The procedure for creating the proposed benchmark dataset is as follows:

Transaction Sampling: Randomly without replacement select a sample of 256,000 mainnet transactions containing Plutus script witnesses. This sample size is chosen as a balance between speed, sufficient data for analysis, and compatibility with high-end server hardware with up to 256 execution threads. The randomness of the sample allows for generalizable predictions of validation time of all transactions with script witnesses.
Script Preparation: For each script witness in the selected transactions, prepare the required arguments and script context data. Save each as a Plutus script in Flat format, with all arguments pre-applied.
File Organization: For easier debugging, organize all extracted scripts using the following filename pattern: <mainnet-epoch>/<transaction-id>-<script-hash>-<redeemer-idx>.flat.

To gather performance data across open-source Plutus implementations, I am reaching out to the projects listed below. If there are any other implementations not listed here, please let me know, as I’d be happy to include them in the benchmark analysis. The known Plutus implementations:

I look forward to your feedback on the proposed benchmark dataset and to your support in providing code that can represent your project in this benchmark.

HarmonicLabs / plutus-machine

Proposal: Cross-Implementation Benchmarking Dataset for Plutus Performance #1