u32-table row count grew a lot with 0.42-alpha 6

The u32 table row count has grown a lot with the latest release, v0.42-alpha6. The new u32 check in merkle_step seems to be the culprit.

Here's the diff for the benchmark result of the tasm-lib snippet tasmlib_hashing_merkle_verify. Notice that the u32-table row count is increased by a factor of 10 in the "worst-case" benchmark where a leaf in a Merkle tree of height 20 is verified. It seems to me that we're paying full cost for each u32 check in merkle_step, every time this instruction is executed. I though, because of the look-up nature of the u32 table, that only the 1st iteration of a merkle_step recurse_or_return loop would pay a price in terms of u32-table rows.

Also in tasm-lib's

     "benchmark_result": {
       "clock_cycle_count": 43,
       "hash_table_height": 72,
-      "u32_table_height": 12,
+      "u32_table_height": 45,
       "op_stack_table_height": 26,
       "ram_table_height": 0
     },
    "case": "WorstCase"
@@ -15,7 +15,7 @@
     "benchmark_result": {
       "clock_cycle_count": 71,
       "hash_table_height": 156,
-      "u32_table_height": 28,
+      "u32_table_height": 278,
       "op_stack_table_height": 26,
       "ram_table_height": 0
     },
     "case": "WorstCase"

The u32-table row count is now so high that it's not too far off being the bottleneck in the TASM implementation of the verifier. We have:

	Processor	Opstack	RAM	Hash	u32
EF4, alpha-5	299.477	246.104	282.946	236.275	76.396
EF4, alpha-6	285.185	235.126	281.599	234.025	211.091

So the u32-table row count almost tripled with alpha-6, because of the u32 check in each iteration of merkle_step. Still not the bottleneck, but I don't think this performance was what we expected when adding this check.

TritonVM / triton-vm

u32-table row count grew a lot with 0.42-alpha 6 #312