Proposal: Cross-Implementation Benchmarking Dataset for Plutus Performance

sierkov commented 3 weeks ago

I'm working on a C++ implementation of Plutus aimed at optimizing batch synchronization. We'd like to benchmark our implementation against existing open-source Plutus implementations to foster cross-learning and understand their relative performance. This issue is a request for feedback on the proposed benchmark dataset, as well as for approved code samples representing your implementation to include in our benchmarks. Detailed information is provided below.

The proposed benchmark dataset is driven by the following considerations:

Predictive Power: Benchmark results should allow us to predict the time required for a given implementation to validate all script witnesses on Cardano’s mainnet.
Efficient Runtime: The benchmark should complete quickly to enable rapid experimentation and performance evaluation.
Parallelization Awareness: It must assess both single-threaded and multi-threaded performance to identify implementation approaches that influence the parallel efficiency of script witness validation.
Sufficient Sample Size: The dataset should contain enough samples to allow computing reasonable sub-splits for further analysis, such as by Plutus version or by Cardano era.

The procedure for creating the proposed benchmark dataset is as follows:

Transaction Sampling: Randomly without replacement select a sample of 256,000 mainnet transactions containing Plutus script witnesses. This sample size is chosen as a balance between speed, sufficient data for analysis, and compatibility with high-end server hardware with up to 256 execution threads. The randomness of the sample allows for generalizable predictions of validation time of all transactions with script witnesses.
Script Preparation: For each script witness in the selected transactions, prepare the required arguments and script context data. Save each as a Plutus script in Flat format, with all arguments pre-applied.
File Organization: For easier debugging, organize all extracted scripts using the following filename pattern: <mainnet-epoch>/<transaction-id>-<script-hash>-<redeemer-idx>.flat.

To gather performance data across open-source Plutus implementations, I am reaching out to the projects listed below. If there are any other implementations not listed here, please let me know, as I’d be happy to include them in the benchmark analysis. The known Plutus implementations:

I look forward to your feedback on the proposed benchmark dataset and to your support in providing code that can represent your project in this benchmark.

nielstron commented 3 weeks ago

Hi, thanks for reaching out. I am very open to this (though note that the python uplc implementation is never meant to be used in a node). Please provide a link to the dataset, I could not find it in the post

nielstron commented 3 weeks ago

And yes, there are two more implementations in Helios (https://github.com/hyperion-bt/helios) and Plu-Ts (https://github.com/HarmonicLabs/plu-ts)

sierkov commented 3 weeks ago

@nielstron, thank you for the quick reponse. I’m currently awaiting feedback from Plutus implementations on the benchmarking methodology to ensure the dataset reflects their input. Tentatively, I expect to have this feedback incorporated within a week or two, and I’ll keep you posted on the timeline as things progress.

In the meantime, if you have any specific suggestions regarding the methodology, I'd love to learn about them. I’d like this dataset to provide practical value to participating projects, so if there are particular requirements that could make it more useful for your development activities, please let me know.

Thank you also for pointing me to the two implementations; I’ll reach out to them as well.

sierkov commented 1 week ago

@nielstron, here are the links:

The dataset’s README file with further details and download links.
The reference benchmarking code.
The raw outputs of the benchmarking code.

The README includes detailed information, such as the latest performance results of the C++ Plutus implementation and step-by-step instructions for reproducing the transaction sampling and dataset creation.

Also, there is some additional information and discussion of the dataset in a related issue in the main Plutus repository: https://github.com/IntersectMBO/plutus/issues/6626

Let me know if you have questions or need support in preparing implementation-specific scripts.

OpShin / uplc

Proposal: Cross-Implementation Benchmarking Dataset for Plutus Performance #38