perf: use allocUnsafe in node

marcus-pousette commented 1 year ago

From #42

comparison benchmark ✔ yamux send and receive 1 0.0625KB chunks 13734.94 ops/s 72.80700 us/op x1.044 10073 runs 1.37 s ✔ yamux send and receive 1 1KB chunks 14130.48 ops/s 70.76900 us/op x1.057 14568 runs 1.83 s ✔ yamux send and receive 1 64KB chunks 12301.48 ops/s 81.29100 us/op x1.030 4802 runs 0.735 s ✔ yamux send and receive 1 1024KB chunks 4136.197 ops/s 241.7680 us/op x0.641 3414 runs 1.32 s ✔ yamux send and receive 1000 0.0625KB chunks 2996.443 ops/s 333.7290 us/op x0.041 3748 runs 1.86 s ✔ yamux send and receive 1000 1KB chunks 668.5900 ops/s 1.495685 ms/op x0.172 1562 runs 2.96 s ✔ yamux send and receive 1000 64KB chunks 99.26800 ops/s 10.07374 ms/op x0.380 32 runs 0.828 s ✔ yamux send and receive 1000 1024KB chunks 6.154025 ops/s 162.4953 ms/op x0.513 9 runs 2.12 s ✔ mplex send and receive 1 0.0625KB chunks 13810.25 ops/s 72.41000 us/op x1.013 11346 runs 1.50 s ✔ mplex send and receive 1 1KB chunks 13811.96 ops/s 72.40100 us/op x0.931 12351 runs 1.54 s ✔ mplex send and receive 1 64KB chunks 12431.78 ops/s 80.43900 us/op x0.932 6555 runs 0.955 s ✔ mplex send and receive 1 1024KB chunks 3391.981 ops/s 294.8130 us/op x1.035 1768 runs 0.973 s ✔ mplex send and receive 1000 0.0625KB chunks 301.1685 ops/s 3.320400 ms/op x1.031 121 runs 0.923 s ✔ mplex send and receive 1000 1KB chunks 260.5144 ops/s 3.838559 ms/op x0.996 497 runs 2.45 s ✔ mplex send and receive 1000 64KB chunks 40.87440 ops/s 24.46519 ms/op x1.068 29 runs 1.22 s ✔ mplex send and receive 1000 1024KB chunks 2.839016 ops/s 352.2347 ms/op x1.066 8 runs 3.46 s

Most significant gain

Currently 
yamux send and receive 1000 0.0625KB chunks                         122.7452 ops/s

With this PR
yamux send and receive 1000 0.0625KB chunks                         2996.443 ops/s

wemeetagain commented 1 year ago

Can you fix the linter issues?

marcus-pousette commented 1 year ago

I realized, the benchmark above is invalid. There are actually no returns to be gained from allocUnsafe since the header is just 12 bytes. In fact, protobufjs has a special case for small arrays where they use Uint8Array instead of Buffer because it is faster when size < 40 (isch). (Testing this by myself, it seems to be true for "old" Node versions)

The issue with the benchmark above is that it provided results when the output was wrong, hence the egregious difference. (frame[0] = 0 was never set because there was an assumption on this when using only Uint8array). Doing this correctly, yields no improvements whatsoever.

I am going to close this PR and look for other improvements.

ChainSafe / js-libp2p-yamux

perf: use allocUnsafe in node #43