ChainSafe / ssz

Typescript implementation of Simple Serialize (SSZ)
https://simpleserialize.com/
Other
50 stars 19 forks source link

feat: SIMD implementation for as-sha256 #367

Closed twoeths closed 4 months ago

twoeths commented 5 months ago

Motivation

SIMD is available in assemblyscript, it supports v128 data structure which mean we can hash 4 inputs in parallel

Description

Closes #356

github-actions[bot] commented 5 months ago

Performance Report

✔️ no performance regression detected

Full benchmark results | Benchmark suite | Current: 81334811864067a6af798af4f015dbcb1b99779d | Previous: cf8f04905e8d93fd44e59a319e1431f5b932d0e3 | Ratio | |-|-|-|-| | digestTwoHashObjects 50023 times | 47.923 ms/op | 47.926 ms/op | 1.00 | | digest64 50023 times | 50.469 ms/op | 50.930 ms/op | 0.99 | | digest 50023 times | 52.153 ms/op | 52.992 ms/op | 0.98 | | input length 32 | 1.2030 us/op | 1.1920 us/op | 1.01 | | input length 64 | 1.3590 us/op | 1.3970 us/op | 0.97 | | input length 128 | 2.2660 us/op | 2.3880 us/op | 0.95 | | input length 256 | 3.3830 us/op | 3.4430 us/op | 0.98 | | input length 512 | 5.5630 us/op | 5.6190 us/op | 0.99 | | input length 1024 | 10.707 us/op | 10.763 us/op | 0.99 | | digest 1000000 times | 824.19 ms/op | 837.14 ms/op | 0.98 | | hashObjectToByteArray 50023 times | 1.4283 ms/op | 1.4692 ms/op | 0.97 | | byteArrayToHashObject 50023 times | 2.4242 ms/op | 2.4603 ms/op | 0.99 | | digest64 200092 times | 206.57 ms/op | | hash 200092 times using batchHash4UintArray64s | 212.05 ms/op | | hash 200092 times using batchHash4HashObjectInputs | 212.59 ms/op | | getGindicesAtDepth | 4.6080 us/op | 4.6690 us/op | 0.99 | | iterateAtDepth | 7.2810 us/op | 7.4530 us/op | 0.98 | | getGindexBits | 428.00 ns/op | 430.00 ns/op | 1.00 | | gindexIterator | 1.0290 us/op | 972.00 ns/op | 1.06 | | hash 2 Uint8Array 2250026 times - as-sha256 | 2.3156 s/op | 2.3533 s/op | 0.98 | | hashTwoObjects 2250026 times - as-sha256 | 2.1663 s/op | 2.2222 s/op | 0.97 | | hash 2 Uint8Array 2250026 times - noble | 5.0159 s/op | 5.2452 s/op | 0.96 | | hashTwoObjects 2250026 times - noble | 6.8932 s/op | 6.8410 s/op | 1.01 | | getNodeH() x7812.5 avg hindex | 12.143 us/op | 12.969 us/op | 0.94 | | getNodeH() x7812.5 index 0 | 6.3680 us/op | 6.6040 us/op | 0.96 | | getNodeH() x7812.5 index 7 | 6.4100 us/op | 6.5780 us/op | 0.97 | | getNodeH() x7812.5 index 7 with key array | 6.3800 us/op | 6.4950 us/op | 0.98 | | new LeafNode() x7812.5 | 14.760 us/op | 15.032 us/op | 0.98 | | multiproof - depth 15, 1 requested leaves | 8.6070 us/op | 9.6410 us/op | 0.89 | | tree offset multiproof - depth 15, 1 requested leaves | 19.633 us/op | 20.563 us/op | 0.95 | | compact multiproof - depth 15, 1 requested leaves | 3.7230 us/op | 5.4290 us/op | 0.69 | | multiproof - depth 15, 2 requested leaves | 11.534 us/op | 12.903 us/op | 0.89 | | tree offset multiproof - depth 15, 2 requested leaves | 21.439 us/op | 23.655 us/op | 0.91 | | compact multiproof - depth 15, 2 requested leaves | 3.4330 us/op | 4.4640 us/op | 0.77 | | multiproof - depth 15, 3 requested leaves | 16.153 us/op | 18.176 us/op | 0.89 | | tree offset multiproof - depth 15, 3 requested leaves | 27.953 us/op | 29.919 us/op | 0.93 | | compact multiproof - depth 15, 3 requested leaves | 4.1860 us/op | 6.4790 us/op | 0.65 | | multiproof - depth 15, 4 requested leaves | 21.466 us/op | 23.370 us/op | 0.92 | | tree offset multiproof - depth 15, 4 requested leaves | 33.883 us/op | 36.995 us/op | 0.92 | | compact multiproof - depth 15, 4 requested leaves | 5.0580 us/op | 5.3080 us/op | 0.95 | | packedRootsBytesToLeafNodes bytes 4000 offset 0 | 1.9560 us/op | 1.9930 us/op | 0.98 | | packedRootsBytesToLeafNodes bytes 4000 offset 1 | 1.9810 us/op | 2.0020 us/op | 0.99 | | packedRootsBytesToLeafNodes bytes 4000 offset 2 | 1.9630 us/op | 2.0000 us/op | 0.98 | | packedRootsBytesToLeafNodes bytes 4000 offset 3 | 1.8760 us/op | 1.9940 us/op | 0.94 | | subtreeFillToContents depth 40 count 250000 | 46.530 ms/op | 45.958 ms/op | 1.01 | | setRoot - gindexBitstring | 8.1636 ms/op | 8.4206 ms/op | 0.97 | | setRoot - gindex | 8.5065 ms/op | 8.7619 ms/op | 0.97 | | getRoot - gindexBitstring | 2.4350 ms/op | 2.4504 ms/op | 0.99 | | getRoot - gindex | 3.3562 ms/op | 3.3620 ms/op | 1.00 | | getHashObject then setHashObject | 10.247 ms/op | 10.481 ms/op | 0.98 | | setNodeWithFn | 7.9182 ms/op | 8.0530 ms/op | 0.98 | | getNodeAtDepth depth 0 x100000 | 1.0832 ms/op | 1.0852 ms/op | 1.00 | | setNodeAtDepth depth 0 x100000 | 2.3466 ms/op | 2.4234 ms/op | 0.97 | | getNodesAtDepth depth 0 x100000 | 1.0524 ms/op | 1.0538 ms/op | 1.00 | | setNodesAtDepth depth 0 x100000 | 1.4245 ms/op | 1.4528 ms/op | 0.98 | | getNodeAtDepth depth 1 x100000 | 1.1464 ms/op | 1.1686 ms/op | 0.98 | | setNodeAtDepth depth 1 x100000 | 5.1183 ms/op | 5.1398 ms/op | 1.00 | | getNodesAtDepth depth 1 x100000 | 1.1763 ms/op | 1.1909 ms/op | 0.99 | | setNodesAtDepth depth 1 x100000 | 4.3033 ms/op | 4.3132 ms/op | 1.00 | | getNodeAtDepth depth 2 x100000 | 1.4276 ms/op | 1.4221 ms/op | 1.00 | | setNodeAtDepth depth 2 x100000 | 8.7806 ms/op | 10.417 ms/op | 0.84 | | getNodesAtDepth depth 2 x100000 | 16.869 ms/op | 18.389 ms/op | 0.92 | | setNodesAtDepth depth 2 x100000 | 12.381 ms/op | 12.926 ms/op | 0.96 | | tree.getNodesAtDepth - gindexes | 7.7827 ms/op | 8.0320 ms/op | 0.97 | | tree.getNodesAtDepth - push all nodes | 1.9585 ms/op | 1.9345 ms/op | 1.01 | | tree.getNodesAtDepth - navigation | 233.92 us/op | 235.57 us/op | 0.99 | | tree.setNodesAtDepth - indexes | 349.98 us/op | 308.89 us/op | 1.13 | | set at depth 8 | 443.00 ns/op | 450.00 ns/op | 0.98 | | set at depth 16 | 588.00 ns/op | 596.00 ns/op | 0.99 | | set at depth 32 | 951.00 ns/op | 958.00 ns/op | 0.99 | | iterateNodesAtDepth 8 256 | 13.080 us/op | 13.212 us/op | 0.99 | | getNodesAtDepth 8 256 | 3.4390 us/op | 3.3790 us/op | 1.02 | | iterateNodesAtDepth 16 65536 | 4.2388 ms/op | 4.3308 ms/op | 0.98 | | getNodesAtDepth 16 65536 | 1.5835 ms/op | 1.6273 ms/op | 0.97 | | iterateNodesAtDepth 32 250000 | 15.410 ms/op | 15.634 ms/op | 0.99 | | getNodesAtDepth 32 250000 | 4.3000 ms/op | 4.3522 ms/op | 0.99 | | iterateNodesAtDepth 40 250000 | 15.540 ms/op | 15.708 ms/op | 0.99 | | getNodesAtDepth 40 250000 | 4.3836 ms/op | 4.4330 ms/op | 0.99 | | 250k validators | 7.1398 s/op | 7.1114 s/op | 1.00 | | bitlist bytes to struct (120,90) | 482.00 ns/op | 484.00 ns/op | 1.00 | | bitlist bytes to tree (120,90) | 2.1360 us/op | 2.1460 us/op | 1.00 | | bitlist bytes to struct (2048,2048) | 911.00 ns/op | 922.00 ns/op | 0.99 | | bitlist bytes to tree (2048,2048) | 3.3240 us/op | 3.3630 us/op | 0.99 | | ByteListType - deserialize | 7.8165 ms/op | 7.3046 ms/op | 1.07 | | BasicListType - deserialize | 11.857 ms/op | 11.915 ms/op | 1.00 | | ByteListType - serialize | 7.8777 ms/op | 7.9004 ms/op | 1.00 | | BasicListType - serialize | 9.6364 ms/op | 10.023 ms/op | 0.96 | | BasicListType - tree_convertToStruct | 22.355 ms/op | 22.655 ms/op | 0.99 | | List[uint8, 68719476736] len 300000 ViewDU.getAll() + iterate | 4.3003 ms/op | 4.4147 ms/op | 0.97 | | List[uint8, 68719476736] len 300000 ViewDU.get(i) | 4.1212 ms/op | 2.9512 ms/op | 1.40 | | Array.push len 300000 empty Array - number | 6.3746 ms/op | 6.2896 ms/op | 1.01 | | Array.set len 300000 from new Array - number | 1.6630 ms/op | 1.7071 ms/op | 0.97 | | Array.set len 300000 - number | 5.2218 ms/op | 5.2257 ms/op | 1.00 | | Uint8Array.set len 300000 | 373.14 us/op | 372.38 us/op | 1.00 | | Uint32Array.set len 300000 | 443.43 us/op | 445.15 us/op | 1.00 | | Container({a: uint8, b: uint8}) getViewDU x300000 | 52.403 ms/op | 49.804 ms/op | 1.05 | | ContainerNodeStruct({a: uint8, b: uint8}) getViewDU x300000 | 10.700 ms/op | 10.834 ms/op | 0.99 | | List(Container) len 300000 ViewDU.getAllReadonly() + iterate | 208.75 ms/op | 209.73 ms/op | 1.00 | | List(Container) len 300000 ViewDU.getAllReadonlyValues() + iterate | 316.36 ms/op | 273.31 ms/op | 1.16 | | List(Container) len 300000 ViewDU.get(i) | 8.7640 ms/op | 6.3717 ms/op | 1.38 | | List(Container) len 300000 ViewDU.getReadonly(i) | 8.1774 ms/op | 6.3376 ms/op | 1.29 | | List(ContainerNodeStruct) len 300000 ViewDU.getAllReadonly() + iterate | 40.470 ms/op | 41.496 ms/op | 0.98 | | List(ContainerNodeStruct) len 300000 ViewDU.getAllReadonlyValues() + iterate | 5.6273 ms/op | 5.1590 ms/op | 1.09 | | List(ContainerNodeStruct) len 300000 ViewDU.get(i) | 7.2073 ms/op | 5.9948 ms/op | 1.20 | | List(ContainerNodeStruct) len 300000 ViewDU.getReadonly(i) | 7.1238 ms/op | 5.9572 ms/op | 1.20 | | Array.push len 300000 empty Array - object | 6.8128 ms/op | 5.9218 ms/op | 1.15 | | Array.set len 300000 from new Array - object | 2.2630 ms/op | 1.9831 ms/op | 1.14 | | Array.set len 300000 - object | 6.7586 ms/op | 5.7016 ms/op | 1.19 | | cachePermanentRootStruct no cache | 9.2840 us/op | 8.5850 us/op | 1.08 | | cachePermanentRootStruct with cache | 237.00 ns/op | 188.00 ns/op | 1.26 | | epochParticipation len 250000 rws 7813 | 2.3041 ms/op | 1.8994 ms/op | 1.21 | | deserialize Attestation - tree | 4.5990 us/op | 4.0490 us/op | 1.14 | | deserialize Attestation - struct | 2.0270 us/op | 1.7750 us/op | 1.14 | | deserialize SignedAggregateAndProof - tree | 3.7370 us/op | 3.6180 us/op | 1.03 | | deserialize SignedAggregateAndProof - struct | 3.1580 us/op | 2.9150 us/op | 1.08 | | deserialize SyncCommitteeMessage - tree | 1.0770 us/op | 1.0360 us/op | 1.04 | | deserialize SyncCommitteeMessage - struct | 1.1750 us/op | 980.00 ns/op | 1.20 | | deserialize SignedContributionAndProof - tree | 2.1180 us/op | 1.9690 us/op | 1.08 | | deserialize SignedContributionAndProof - struct | 2.5370 us/op | 2.3590 us/op | 1.08 | | deserialize SignedBeaconBlock - tree | 238.34 us/op | 208.32 us/op | 1.14 | | deserialize SignedBeaconBlock - struct | 126.23 us/op | 120.84 us/op | 1.04 | | BeaconState vc 300000 - deserialize tree | 598.10 ms/op | 593.02 ms/op | 1.01 | | BeaconState vc 300000 - serialize tree | 147.94 ms/op | 148.19 ms/op | 1.00 | | BeaconState.historicalRoots vc 300000 - deserialize tree | 876.00 ns/op | 821.00 ns/op | 1.07 | | BeaconState.historicalRoots vc 300000 - serialize tree | 800.00 ns/op | 765.00 ns/op | 1.05 | | BeaconState.validators vc 300000 - deserialize tree | 550.23 ms/op | 521.80 ms/op | 1.05 | | BeaconState.validators vc 300000 - serialize tree | 98.321 ms/op | 102.19 ms/op | 0.96 | | BeaconState.balances vc 300000 - deserialize tree | 20.496 ms/op | 20.686 ms/op | 0.99 | | BeaconState.balances vc 300000 - serialize tree | 4.0125 ms/op | 3.9926 ms/op | 1.00 | | BeaconState.previousEpochParticipation vc 300000 - deserialize tree | 548.56 us/op | 684.49 us/op | 0.80 | | BeaconState.previousEpochParticipation vc 300000 - serialize tree | 291.01 us/op | 288.96 us/op | 1.01 | | BeaconState.currentEpochParticipation vc 300000 - deserialize tree | 563.17 us/op | 450.13 us/op | 1.25 | | BeaconState.currentEpochParticipation vc 300000 - serialize tree | 283.88 us/op | 287.17 us/op | 0.99 | | BeaconState.inactivityScores vc 300000 - deserialize tree | 21.006 ms/op | 20.081 ms/op | 1.05 | | BeaconState.inactivityScores vc 300000 - serialize tree | 4.1597 ms/op | 3.6692 ms/op | 1.13 | | hashTreeRoot Attestation - struct | 33.643 us/op | 27.463 us/op | 1.23 | | hashTreeRoot Attestation - tree | 21.286 us/op | 18.111 us/op | 1.18 | | hashTreeRoot SignedAggregateAndProof - struct | 57.859 us/op | 37.426 us/op | 1.55 | | hashTreeRoot SignedAggregateAndProof - tree | 29.846 us/op | 27.126 us/op | 1.10 | | hashTreeRoot SyncCommitteeMessage - struct | 10.282 us/op | 8.9650 us/op | 1.15 | | hashTreeRoot SyncCommitteeMessage - tree | 6.6760 us/op | 6.3710 us/op | 1.05 | | hashTreeRoot SignedContributionAndProof - struct | 26.790 us/op | 24.215 us/op | 1.11 | | hashTreeRoot SignedContributionAndProof - tree | 20.062 us/op | 19.253 us/op | 1.04 | | hashTreeRoot SignedBeaconBlock - struct | 2.5356 ms/op | 2.1739 ms/op | 1.17 | | hashTreeRoot SignedBeaconBlock - tree | 1.7796 ms/op | 1.6946 ms/op | 1.05 | | hashTreeRoot Validator - struct | 12.951 us/op | 12.096 us/op | 1.07 | | hashTreeRoot Validator - tree | 11.074 us/op | 10.355 us/op | 1.07 | | BeaconState vc 300000 - hashTreeRoot tree | 3.6886 s/op | 3.6525 s/op | 1.01 | | BeaconState.historicalRoots vc 300000 - hashTreeRoot tree | 1.3500 us/op | 1.3400 us/op | 1.01 | | BeaconState.validators vc 300000 - hashTreeRoot tree | 3.4979 s/op | 3.4974 s/op | 1.00 | | BeaconState.balances vc 300000 - hashTreeRoot tree | 86.933 ms/op | 86.452 ms/op | 1.01 | | BeaconState.previousEpochParticipation vc 300000 - hashTreeRoot tree | 9.0174 ms/op | 9.0131 ms/op | 1.00 | | BeaconState.currentEpochParticipation vc 300000 - hashTreeRoot tree | 9.0452 ms/op | 9.0085 ms/op | 1.00 | | BeaconState.inactivityScores vc 300000 - hashTreeRoot tree | 88.884 ms/op | 86.569 ms/op | 1.03 | | hash64 x18 | 19.557 us/op | 19.358 us/op | 1.01 | | hashTwoObjects x18 | 18.413 us/op | 17.861 us/op | 1.03 | | hash64 x1740 | 1.8220 ms/op | 1.8124 ms/op | 1.01 | | hashTwoObjects x1740 | 1.7030 ms/op | 1.7224 ms/op | 0.99 | | hash64 x2700000 | 2.8527 s/op | 2.8213 s/op | 1.01 | | hashTwoObjects x2700000 | 2.6502 s/op | 2.6376 s/op | 1.00 | | get_exitEpoch - ContainerType | 226.00 ns/op | 190.00 ns/op | 1.19 | | get_exitEpoch - ContainerNodeStructType | 231.00 ns/op | 190.00 ns/op | 1.22 | | set_exitEpoch - ContainerType | 239.00 ns/op | 254.00 ns/op | 0.94 | | set_exitEpoch - ContainerNodeStructType | 237.00 ns/op | 204.00 ns/op | 1.16 | | get_pubkey - ContainerType | 894.00 ns/op | 854.00 ns/op | 1.05 | | get_pubkey - ContainerNodeStructType | 233.00 ns/op | 201.00 ns/op | 1.16 | | hashTreeRoot - ContainerType | 371.00 ns/op | 337.00 ns/op | 1.10 | | hashTreeRoot - ContainerNodeStructType | 446.00 ns/op | 378.00 ns/op | 1.18 | | createProof - ContainerType | 4.2990 us/op | 3.7110 us/op | 1.16 | | createProof - ContainerNodeStructType | 21.894 us/op | 19.853 us/op | 1.10 | | serialize - ContainerType | 1.8750 us/op | 1.7860 us/op | 1.05 | | serialize - ContainerNodeStructType | 1.5420 us/op | 1.5830 us/op | 0.97 | | set_exitEpoch_and_hashTreeRoot - ContainerType | 4.2740 us/op | 4.1860 us/op | 1.02 | | set_exitEpoch_and_hashTreeRoot - ContainerNodeStructType | 11.401 us/op | 11.102 us/op | 1.03 | | Array - for of | 5.5600 us/op | 5.6380 us/op | 0.99 | | Array - for(;;) | 5.5480 us/op | 5.4620 us/op | 1.02 | | basicListValue.readonlyValuesArray() | 4.3692 ms/op | 4.2076 ms/op | 1.04 | | basicListValue.readonlyValuesArray() + loop all | 5.2851 ms/op | 4.1542 ms/op | 1.27 | | compositeListValue.readonlyValuesArray() | 29.942 ms/op | 27.561 ms/op | 1.09 | | compositeListValue.readonlyValuesArray() + loop all | 29.698 ms/op | 29.214 ms/op | 1.02 | | Number64UintType - get balances list | 4.2828 ms/op | 4.3291 ms/op | 0.99 | | Number64UintType - set balances list | 9.5034 ms/op | 10.021 ms/op | 0.95 | | Number64UintType - get and increase 10 then set | 39.115 ms/op | 40.389 ms/op | 0.97 | | Number64UintType - increase 10 using applyDelta | 15.591 ms/op | 17.193 ms/op | 0.91 | | Number64UintType - increase 10 using applyDeltaInBatch | 15.269 ms/op | 17.224 ms/op | 0.89 | | tree_newTreeFromUint64Deltas | 16.533 ms/op | 13.377 ms/op | 1.24 | | unsafeUint8ArrayToTree | 29.468 ms/op | 26.745 ms/op | 1.10 | | bitLength(50) | 216.00 ns/op | 203.00 ns/op | 1.06 | | bitLengthStr(50) | 209.00 ns/op | 193.00 ns/op | 1.08 | | bitLength(8000) | 201.00 ns/op | 197.00 ns/op | 1.02 | | bitLengthStr(8000) | 255.00 ns/op | 245.00 ns/op | 1.04 | | bitLength(250000) | 223.00 ns/op | 208.00 ns/op | 1.07 | | bitLengthStr(250000) | 314.00 ns/op | 297.00 ns/op | 1.06 | | floor - Math.floor (53) | 1.2371 ns/op | 1.2564 ns/op | 0.98 | | floor - << 0 (53) | 1.2366 ns/op | 1.2374 ns/op | 1.00 | | floor - Math.floor (512) | 1.2370 ns/op | 1.2365 ns/op | 1.00 | | floor - << 0 (512) | 1.2553 ns/op | 1.2364 ns/op | 1.02 | | fnIf(0) | 1.5527 ns/op | 1.5548 ns/op | 1.00 | | fnSwitch(0) | 2.1715 ns/op | 2.1661 ns/op | 1.00 | | fnObj(0) | 1.5467 ns/op | 1.5695 ns/op | 0.99 | | fnArr(0) | 1.5472 ns/op | 1.5471 ns/op | 1.00 | | fnIf(4) | 2.1654 ns/op | 2.1932 ns/op | 0.99 | | fnSwitch(4) | 2.1660 ns/op | 2.1642 ns/op | 1.00 | | fnObj(4) | 1.5546 ns/op | 1.5485 ns/op | 1.00 | | fnArr(4) | 1.5475 ns/op | 1.5481 ns/op | 1.00 | | fnIf(9) | 3.1564 ns/op | 3.0949 ns/op | 1.02 | | fnSwitch(9) | 2.1665 ns/op | 2.1954 ns/op | 0.99 | | fnObj(9) | 1.5461 ns/op | 1.5493 ns/op | 1.00 | | fnArr(9) | 1.5531 ns/op | 1.5497 ns/op | 1.00 | | Container {a,b,vec} - as struct x100000 | 124.07 us/op | 123.91 us/op | 1.00 | | Container {a,b,vec} - as tree x100000 | 340.37 us/op | 340.30 us/op | 1.00 | | Container {a,vec,b} - as struct x100000 | 157.79 us/op | 154.77 us/op | 1.02 | | Container {a,vec,b} - as tree x100000 | 371.42 us/op | 372.12 us/op | 1.00 | | get 2 props x1000000 - rawObject | 309.44 us/op | 310.81 us/op | 1.00 | | get 2 props x1000000 - proxy | 73.948 ms/op | 72.741 ms/op | 1.02 | | get 2 props x1000000 - customObj | 309.77 us/op | 309.33 us/op | 1.00 | | Simple object binary -> struct | 861.00 ns/op | 795.00 ns/op | 1.08 | | Simple object binary -> tree_backed | 1.6640 us/op | 1.5580 us/op | 1.07 | | Simple object struct -> tree_backed | 2.3310 us/op | 2.1900 us/op | 1.06 | | Simple object tree_backed -> struct | 2.2450 us/op | 2.1540 us/op | 1.04 | | Simple object struct -> binary | 1.0160 us/op | 1.0830 us/op | 0.94 | | Simple object tree_backed -> binary | 1.5700 us/op | 1.5820 us/op | 0.99 | | aggregationBits binary -> struct | 627.00 ns/op | 589.00 ns/op | 1.06 | | aggregationBits binary -> tree_backed | 2.4090 us/op | 2.3670 us/op | 1.02 | | aggregationBits struct -> tree_backed | 2.8380 us/op | 2.8010 us/op | 1.01 | | aggregationBits tree_backed -> struct | 1.2140 us/op | 1.1880 us/op | 1.02 | | aggregationBits struct -> binary | 797.00 ns/op | 774.00 ns/op | 1.03 | | aggregationBits tree_backed -> binary | 1.0750 us/op | 1.0300 us/op | 1.04 | | List(uint8) 100000 binary -> struct | 1.3397 ms/op | 1.4490 ms/op | 0.92 | | List(uint8) 100000 binary -> tree_backed | 93.770 us/op | 88.515 us/op | 1.06 | | List(uint8) 100000 struct -> tree_backed | 1.1678 ms/op | 1.1905 ms/op | 0.98 | | List(uint8) 100000 tree_backed -> struct | 1.0327 ms/op | 1.0591 ms/op | 0.98 | | List(uint8) 100000 struct -> binary | 988.12 us/op | 1.0094 ms/op | 0.98 | | List(uint8) 100000 tree_backed -> binary | 88.551 us/op | 87.930 us/op | 1.01 | | List(uint64Number) 100000 binary -> struct | 1.2350 ms/op | 1.2081 ms/op | 1.02 | | List(uint64Number) 100000 binary -> tree_backed | 2.8315 ms/op | 3.2269 ms/op | 0.88 | | List(uint64Number) 100000 struct -> tree_backed | 3.9792 ms/op | 4.8569 ms/op | 0.82 | | List(uint64Number) 100000 tree_backed -> struct | 2.0545 ms/op | 2.3570 ms/op | 0.87 | | List(uint64Number) 100000 struct -> binary | 1.3642 ms/op | 1.5680 ms/op | 0.87 | | List(uint64Number) 100000 tree_backed -> binary | 810.64 us/op | 905.40 us/op | 0.90 | | List(Uint64Bigint) 100000 binary -> struct | 3.5439 ms/op | 3.6912 ms/op | 0.96 | | List(Uint64Bigint) 100000 binary -> tree_backed | 3.2928 ms/op | 3.3661 ms/op | 0.98 | | List(Uint64Bigint) 100000 struct -> tree_backed | 5.2914 ms/op | 5.5335 ms/op | 0.96 | | List(Uint64Bigint) 100000 tree_backed -> struct | 4.5456 ms/op | 4.6956 ms/op | 0.97 | | List(Uint64Bigint) 100000 struct -> binary | 2.0308 ms/op | 2.0423 ms/op | 0.99 | | List(Uint64Bigint) 100000 tree_backed -> binary | 982.22 us/op | 1.1645 ms/op | 0.84 | | Vector(Root) 100000 binary -> struct | 28.981 ms/op | 31.484 ms/op | 0.92 | | Vector(Root) 100000 binary -> tree_backed | 32.772 ms/op | 33.719 ms/op | 0.97 | | Vector(Root) 100000 struct -> tree_backed | 37.789 ms/op | 37.528 ms/op | 1.01 | | Vector(Root) 100000 tree_backed -> struct | 44.906 ms/op | 45.449 ms/op | 0.99 | | Vector(Root) 100000 struct -> binary | 2.6262 ms/op | 2.5929 ms/op | 1.01 | | Vector(Root) 100000 tree_backed -> binary | 9.5413 ms/op | 10.302 ms/op | 0.93 | | List(Validator) 100000 binary -> struct | 105.60 ms/op | 108.18 ms/op | 0.98 | | List(Validator) 100000 binary -> tree_backed | 288.03 ms/op | 290.31 ms/op | 0.99 | | List(Validator) 100000 struct -> tree_backed | 295.83 ms/op | 302.03 ms/op | 0.98 | | List(Validator) 100000 tree_backed -> struct | 190.95 ms/op | 192.89 ms/op | 0.99 | | List(Validator) 100000 struct -> binary | 26.600 ms/op | 27.086 ms/op | 0.98 | | List(Validator) 100000 tree_backed -> binary | 101.26 ms/op | 101.01 ms/op | 1.00 | | List(Validator-NS) 100000 binary -> struct | 98.635 ms/op | 105.24 ms/op | 0.94 | | List(Validator-NS) 100000 binary -> tree_backed | 146.63 ms/op | 144.50 ms/op | 1.01 | | List(Validator-NS) 100000 struct -> tree_backed | 173.36 ms/op | 173.97 ms/op | 1.00 | | List(Validator-NS) 100000 tree_backed -> struct | 144.68 ms/op | 146.22 ms/op | 0.99 | | List(Validator-NS) 100000 struct -> binary | 26.798 ms/op | 27.026 ms/op | 0.99 | | List(Validator-NS) 100000 tree_backed -> binary | 33.001 ms/op | 32.982 ms/op | 1.00 | | get epochStatuses - MutableVector | 90.933 us/op | 104.84 us/op | 0.87 | | get epochStatuses - ViewDU | 208.96 us/op | 208.53 us/op | 1.00 | | set epochStatuses - ListTreeView | 1.4093 ms/op | 1.6046 ms/op | 0.88 | | set epochStatuses - ListTreeView - set() | 440.21 us/op | 457.65 us/op | 0.96 | | set epochStatuses - ListTreeView - commit() | 446.39 us/op | 438.80 us/op | 1.02 | | bitstring | 641.44 ns/op | 645.17 ns/op | 0.99 | | bit mask | 13.464 ns/op | 14.232 ns/op | 0.95 | | struct - increase slot to 1000000 | 928.47 us/op | 927.45 us/op | 1.00 | | UintNumberType - increase slot to 1000000 | 21.668 ms/op | 23.901 ms/op | 0.91 | | UintBigintType - increase slot to 1000000 | 166.59 ms/op | 200.68 ms/op | 0.83 | | UintBigint8 x 100000 tree_deserialize | 4.5355 ms/op | 5.2920 ms/op | 0.86 | | UintBigint8 x 100000 tree_serialize | 1.0914 ms/op | 1.0923 ms/op | 1.00 | | UintBigint16 x 100000 tree_deserialize | 4.5547 ms/op | 6.1811 ms/op | 0.74 | | UintBigint16 x 100000 tree_serialize | 1.1746 ms/op | 1.5894 ms/op | 0.74 | | UintBigint32 x 100000 tree_deserialize | 4.7314 ms/op | 5.8123 ms/op | 0.81 | | UintBigint32 x 100000 tree_serialize | 1.1852 ms/op | 1.4116 ms/op | 0.84 | | UintBigint64 x 100000 tree_deserialize | 4.9360 ms/op | 6.5494 ms/op | 0.75 | | UintBigint64 x 100000 tree_serialize | 1.5536 ms/op | 1.9879 ms/op | 0.78 | | UintBigint8 x 100000 value_deserialize | 432.91 us/op | 432.99 us/op | 1.00 | | UintBigint8 x 100000 value_serialize | 623.87 us/op | 708.83 us/op | 0.88 | | UintBigint16 x 100000 value_deserialize | 466.47 us/op | 464.54 us/op | 1.00 | | UintBigint16 x 100000 value_serialize | 709.62 us/op | 788.61 us/op | 0.90 | | UintBigint32 x 100000 value_deserialize | 433.18 us/op | 433.86 us/op | 1.00 | | UintBigint32 x 100000 value_serialize | 660.54 us/op | 786.64 us/op | 0.84 | | UintBigint64 x 100000 value_deserialize | 495.88 us/op | 510.50 us/op | 0.97 | | UintBigint64 x 100000 value_serialize | 850.03 us/op | 1.0409 ms/op | 0.82 | | UintBigint8 x 100000 deserialize | 2.8597 ms/op | 3.6057 ms/op | 0.79 | | UintBigint8 x 100000 serialize | 1.4574 ms/op | 1.6029 ms/op | 0.91 | | UintBigint16 x 100000 deserialize | 2.8137 ms/op | 3.1933 ms/op | 0.88 | | UintBigint16 x 100000 serialize | 1.4876 ms/op | 1.5637 ms/op | 0.95 | | UintBigint32 x 100000 deserialize | 2.7950 ms/op | 3.2083 ms/op | 0.87 | | UintBigint32 x 100000 serialize | 2.7531 ms/op | 2.9506 ms/op | 0.93 | | UintBigint64 x 100000 deserialize | 3.7903 ms/op | 3.8717 ms/op | 0.98 | | UintBigint64 x 100000 serialize | 1.5308 ms/op | 1.5096 ms/op | 1.01 | | UintBigint128 x 100000 deserialize | 5.4717 ms/op | 5.0612 ms/op | 1.08 | | UintBigint128 x 100000 serialize | 14.511 ms/op | 14.205 ms/op | 1.02 | | UintBigint256 x 100000 deserialize | 7.7624 ms/op | 8.0662 ms/op | 0.96 | | UintBigint256 x 100000 serialize | 42.970 ms/op | 42.049 ms/op | 1.02 | | Slice from Uint8Array x25000 | 1.1213 ms/op | 1.1554 ms/op | 0.97 | | Slice from ArrayBuffer x25000 | 16.798 ms/op | 16.639 ms/op | 1.01 | | Slice from ArrayBuffer x25000 + new Uint8Array | 18.801 ms/op | 18.124 ms/op | 1.04 | | Copy Uint8Array 100000 iterate | 1.6477 ms/op | 1.6601 ms/op | 0.99 | | Copy Uint8Array 100000 slice | 104.80 us/op | 130.82 us/op | 0.80 | | Copy Uint8Array 100000 Uint8Array.prototype.slice.call | 110.86 us/op | 137.70 us/op | 0.81 | | Copy Buffer 100000 Uint8Array.prototype.slice.call | 110.70 us/op | 130.41 us/op | 0.85 | | Copy Uint8Array 100000 slice + set | 176.37 us/op | 238.49 us/op | 0.74 | | Copy Uint8Array 100000 subarray + set | 112.81 us/op | 127.50 us/op | 0.88 | | Copy Uint8Array 100000 slice arrayBuffer | 116.61 us/op | 130.35 us/op | 0.89 | | Uint64 deserialize 100000 - iterate Uint8Array | 1.7804 ms/op | 1.8916 ms/op | 0.94 | | Uint64 deserialize 100000 - by Uint32A | 1.8257 ms/op | 1.9184 ms/op | 0.95 | | Uint64 deserialize 100000 - by DataView.getUint32 x2 | 1.8503 ms/op | 1.9187 ms/op | 0.96 | | Uint64 deserialize 100000 - by DataView.getBigUint64 | 5.0285 ms/op | 5.0542 ms/op | 0.99 | | Uint64 deserialize 100000 - by byte | 40.106 ms/op | 40.585 ms/op | 0.99 |

by benchmarkbot/action

twoeths commented 5 months ago

the performance of simd implementation really depends on the cpu, below is simd vs digest64

  digest64 vs hash4Input64s vs hash8HashObjects
    ✓ digest64 200092 times                                               6.206878 ops/s    161.1116 ms/op        -         60 runs   10.3 s
    ✓ hash 200092 times using hash4Input64s                               7.460423 ops/s    134.0406 ms/op        -         72 runs   10.2 s
    ✓ hash 200092 times using hash8HashObjects                            7.834839 ops/s    127.6350 ms/op        -         76 runs   10.2 s