TritonVM / triton-vm

Triton is a virtual machine that comes with Algebraic Execution Tables (AET) and Arithmetic Intermediate Representations (AIR) for use in combination with a STARK proof system.
https://triton-vm.org
Apache License 2.0
240 stars 36 forks source link

u32-table row count grew a lot with 0.42-alpha 6 #312

Open Sword-Smith opened 1 month ago

Sword-Smith commented 1 month ago

The u32 table row count has grown a lot with the latest release, v0.42-alpha6. The new u32 check in merkle_step seems to be the culprit.

Here's the diff for the benchmark result of the tasm-lib snippet tasmlib_hashing_merkle_verify. Notice that the u32-table row count is increased by a factor of 10 in the "worst-case" benchmark where a leaf in a Merkle tree of height 20 is verified. It seems to me that we're paying full cost for each u32 check in merkle_step, every time this instruction is executed. I though, because of the look-up nature of the u32 table, that only the 1st iteration of a merkle_step recurse_or_return loop would pay a price in terms of u32-table rows.

Also in tasm-lib's

     "benchmark_result": {
       "clock_cycle_count": 43,
       "hash_table_height": 72,
-      "u32_table_height": 12,
+      "u32_table_height": 45,
       "op_stack_table_height": 26,
       "ram_table_height": 0
     },
    "case": "WorstCase"
@@ -15,7 +15,7 @@
     "benchmark_result": {
       "clock_cycle_count": 71,
       "hash_table_height": 156,
-      "u32_table_height": 28,
+      "u32_table_height": 278,
       "op_stack_table_height": 26,
       "ram_table_height": 0
     },
     "case": "WorstCase"

The u32-table row count is now so high that it's not too far off being the bottleneck in the TASM implementation of the verifier. We have:

Processor Opstack RAM Hash u32
EF4, alpha-5 299.477 246.104 282.946 236.275 76.396
EF4, alpha-6 285.185 235.126 281.599 234.025 211.091

So the u32-table row count almost tripled with alpha-6, because of the u32 check in each iteration of merkle_step. Still not the bottleneck, but I don't think this performance was what we expected when adding this check.

jan-ferdinand commented 1 month ago

The underlying issue is an oversight of mine when adding a range check for the node index to instruction merkle_step. Each execution of merkle_step adds an (or updates the existing) entry to the U32 Table. Crucially, walking two different paths even for the same Merkle tree generates a new U32 Table entry for every distinct node in the union of both paths.

A potential way to address this is to allow U32 Table lookups of intermediate results. Crucially, not all instructions can get this new power, as they might have invalid intermediate results (like lt).