OpenFusionist / BountyBoard

This is a simple bounty board where you can exchange PRs for money.
0 stars 0 forks source link

Turning Green #1

Open CharlesFus opened 1 month ago

CharlesFus commented 1 month ago

Brief Description: This is our first Bounty, targeting performance optimization for the red items in the "Are We Fast Yet?"[^2]

[^2]: More background? Here: https://devlog.fusionist.io/posts/are-we-fast-yet/

Bounty Target: Based on today's result snapshot which I've converted into a single local HTML file, any client + testcase with a score greater than 10.0

Bounty Details:

# Testcase Bounty Range (USD[^1]) Clients
1 Burnt Pix 200~2000 Besu
2 EcMul with (1, 2) and scalar 2 100~500 Erigon, Geth
3 EcMul with 32-byte coordinates and scalar 2 100~500 Erigon, Geth
4 EcMul with (0, 0) and scalar 2 100~500 Besu, Erigon, Geth
5 EcMul with (0, 0) and 32-byte scalar 100~500 Besu, Erigon, Geth, Reth
6 Maximum memory usage on the second launch using a 1000MB Genesis file 1000~1500 Besu, Nethermind, Reth
7 Time required to launch the client with a 1000MB Genesis file on the second startup 1000~1500 Besu, Nethermind, Reth

Theoretically, the maximum prize distribution for this event is: 2000 1 + 500 2 + 500 2 + 500 3 + 500 3 + 1500 3 + 1500 * 3 = 16000 USD

[^1]: The bounty will be paid in ACE, the native token of the Endurance network, equivalent to the USD amount, and transferred to the wallet address provided by the recipient.


Example Scenario:

Bob Improved Besu‘s "Burnt Pix" performance to a score of 5.0.

Bounty Calculation:

Result: Bob earns 1000 USD.


How to Prove Completion?

To claim a bounty, contributors must follow these steps:

  1. Submit a PR: When submitting a Pull Request to the target project repository(Besu/Erigon/Geth/Nethermind/Reth), clearly state that the optimization is intended for a specific testcase in this bounty. This ensures the changes are targeted and not incidental.

  2. Notify After PR Merge: Once the PR is merged and included in the latest release version of the client, notify us in this issue.

  3. Verify Performance: I will review the latest results on arewefastyet to verify if the score is below 10.

Edge Cases:


Issue Closure:

This issue will close after six months or once all 7 testcases are optimized, whichever occurs first.

As this is our first time hosting a bounty, there may be unforeseen oversights. Therefore, the final interpretation of the rules and decisions related to this bounty remains at our discretion.

garyschulte commented 4 weeks ago

Thanks for your benchmarking efforts. And thanks for highlighting the recent ECMUL regression - we were aware of the improvement in the typical use case, but were not aware of the regresson on those particular cases.

We have a few efforts underway to fix the regression, one is directly in besu:

Another is in besu-native:

As I am sure you are aware, benchmarks don't tell a complete or clear picture, but at least these tweaks should see besu perform better in the suites you have chosen.

It will be at least a release cycle before the first to improvements will be reflected in the benchmarks, but running locally these are the metrics I gathered for the ECMUL suite:

edit: I removed the reference to parallel tx processing, since the nethermind gas-benchmark suite appears to use a single transaction per block

CharlesFus commented 3 weeks ago

Damn, Nethermind just took the performance of the Precompiles series to a whole new level in version 1.28.0!!!

I noticed that the performance of Blake2f 1M saw a huge leap, while Blake2f 1 remained relatively stable. I feel like the performance boost isn’t coming from the precompile instruction itself but more from caching. For example, maybe Nethermind caches a specific precompile method result during contract execution if it’s called repeatedly? Maybe @LukaszRozmej can give us some insight? image

This makes optimizing the EcMul part of the Bounty a lot more challenging!😭 But in real contracts, there could indeed be multiple calls to a specific precompile method? So I think it's better to still follow the rules of the game.

LukaszRozmej commented 3 weeks ago

@CharlesFus everything is in release notes: It comes from https://github.com/NethermindEth/nethermind/pull/7106 PR, which introduces a small cache for precompiles, so if you run precompile with the same inputs, you should get results almost instantly. This is especially useful when you start doing parallel execution, as one thread might have already executed the transaction. However, it still needs to be re-executed due to changed state dependencies. There is still a good chance that the precompile calls will be the same.

If you want a good benchmark you can either disable that caching (--Blocks.PreWarmStateOnBlockProcessing false - a bit unintuitive flag, but it is connected to other caches) or run with a slightly changed input for every call.