101arrowz / fflate

High performance (de)compression in an 8kB package
https://101arrowz.github.io/fflate
MIT License
2.27k stars 79 forks source link

improve adler32 perf #56

Closed evanwashere closed 3 years ago

evanwashere commented 3 years ago

this casts length into i32 while assuming chunk will never be larger than i32

1gb 500mb
before 1134ms 553ms
after 845ms 412ms
101arrowz commented 3 years ago

Looks good. I'll probably test this a bit further before merging though - I think this optimization can be applied elsewhere too.

By the way - I noticed you're using this in ImageScript and applied some optimizations with >> 3 over / 8, ===/!== vs ==/!=, etc. From my prior empirical testing, bit shifting is actually slower than division + OR (I think due to the ToUint32 requirement), and the non-strict operators aren't any faster but lead to larger bundle size. However, if you have a POC for the perf uplift, I'm happy to implement those as well.

evanwashere commented 3 years ago

ToUint32 is slower than ToInt32 so i mostly avoid it when possible and use i32

div vs bit shifting

yeah bit shifting doesn't improve anything in v8 as it's smart enough to already do this optimization, but my use is not limited to v8, so I'll keep it

bytecode diff ```asm [generated bytecode for function: bytecode (0x1ea9b4bb00c1 0x1ea9b4bb0b6e @ 0 : 1b 02 LdaImmutab | 24 S> 0x2ca2679f0b6e @ 0 : 1b 02 LdaImmutab 0x1ea9b4bb0b70 @ 2 : ac 00 ThrowRefer | 0x2ca2679f0b70 @ 2 : ac 00 ThrowRefer 0x1ea9b4bb0b72 @ 4 : c6 Star0 | 0x2ca2679f0b72 @ 4 : c6 Star0 0x1ea9b4bb0b73 @ 5 : 0c 01 LdaSmi [1] | 0x2ca2679f0b73 @ 5 : 0c 01 LdaSmi [1] 0x1ea9b4bb0b75 @ 7 : c5 Star1 | 0x2ca2679f0b75 @ 7 : c5 Star1 0x1ea9b4bb0b76 @ 8 : 0c 01 LdaSmi [1] | 0x2ca2679f0b76 @ 8 : 0c 01 LdaSmi [1] 0x1ea9b4bb0b78 @ 10 : c4 Star2 | 0x2ca2679f0b78 @ 10 : c4 Star2 0x1ea9b4bb0b79 @ 11 : 0c 01 LdaSmi [1] | 0x2ca2679f0b79 @ 11 : 0c 01 LdaSmi [1] 0x1ea9b4bb0b7b @ 13 : c3 Star3 | 0x2ca2679f0b7b @ 13 : c3 Star3 0x1ea9b4bb0b7c @ 14 : 0c 01 LdaSmi [1] | 0x2ca2679f0b7c @ 14 : 0c 01 LdaSmi [1] 0x1ea9b4bb0b7e @ 16 : c2 Star4 | 0x2ca2679f0b7e @ 16 : c2 Star4 0x1ea9b4bb0b7f @ 17 : 0c 01 LdaSmi [1] | 0x2ca2679f0b7f @ 17 : 0c 01 LdaSmi [1] 0x1ea9b4bb0b81 @ 19 : c1 Star5 | 0x2ca2679f0b81 @ 19 : c1 Star5 70 E> 0x1ea9b4bb0b82 @ 20 : 5c fa f9 05 00 CallUndefi | 24 E> 0x2ca2679f0b82 @ 20 : 5c fa f9 05 00 CallUndefi 0x1ea9b4bb0b87 @ 25 : 0d LdaUndefin | 0x2ca2679f0b87 @ 25 : 0d LdaUndefin 116 S> 0x1ea9b4bb0b88 @ 26 : ab Return | 116 S> 0x2ca2679f0b88 @ 26 : ab Return Constant pool (size = 1) Constant pool (size = 1) 0x1ea9b4bb0b21: [FixedArray] in OldSpace | 0x2ca2679f0b21: [FixedArray] in OldSpace - map: 0x3521e56c12c1 | - map: 0x33bc170412c1 - length: 1 - length: 1 0: 0x3ff4315fee99 | 0: 0x2862bd9fee99 Handler Table (size = 0) Handler Table (size = 0) Source Position Table (size = 10) | Source Position Table (size = 9) 0x1ea9b4bb0b91 | 0x2ca2679f0b91 --- Raw source --- --- Raw source --- () { () { // noop(8 / 8, 8 / 8, 8 / 8, 8 / 8, 8 / 8); | noop(8 / 8, 8 / 8, 8 / 8, 8 / 8, 8 / 8); noop(8 >> 3, 8 >> 3, 8 >> 3, 8 >> 3, 8 >> 3); | // noop(8 >> 3, 8 >> 3, 8 >> 3, 8 >> 3, 8 >> 3); } } --- Optimized code --- --- Optimized code --- optimization_id = 0 optimization_id = 0 source_position = 17 source_position = 17 kind = TURBOFAN kind = TURBOFAN name = bytecode name = bytecode stack_slots = 6 stack_slots = 6 compiler = turbofan compiler = turbofan address = 0x1bbb7c082c1 | address = 0x35ced5008001 Instructions (size = 188) Instructions (size = 188) 0x1bbb7c08320 0 f85c0050 ldur x16, [x2, #-64] | 0x35ced5008060 0 f85c0050 ldur x16, [x2, #-64] 0x1bbb7c08324 4 b840f210 ldur w16, [x16, #15] | 0x35ced5008064 4 b840f210 ldur w16, [x16, #15] 0x1bbb7c08328 8 36000070 tbz w16, #0, #+0xc (addr | 0x35ced5008068 8 36000070 tbz w16, #0, #+0xc (addr 0x1bbb7c0832c c 580004b1 ldr x17, pc+148 (addr 0x0 | 0x35ced500806c c 580004b1 ldr x17, pc+148 (addr 0x 0x1bbb7c08330 10 d61f0220 br x17 | 0x35ced5008070 10 d61f0220 br x17 0x1bbb7c08334 14 a9bf7bfd stp fp, lr, [sp, #-16]! | 0x35ced5008074 14 a9bf7bfd stp fp, lr, [sp, #-16]! 0x1bbb7c08338 18 910003fd mov fp, sp | 0x35ced5008078 18 910003fd mov fp, sp 0x1bbb7c0833c 1c a9be03ff stp xzr, x0, [sp, #-32]! | 0x35ced500807c 1c a9be03ff stp xzr, x0, [sp, #-32]! 0x1bbb7c08340 20 a9016fe1 stp x1, cp, [sp, #16] | 0x35ced5008080 20 a9016fe1 stp x1, cp, [sp, #16] 0x1bbb7c08344 24 f8550342 ldur x2, [x26, #-176] | 0x35ced5008084 24 f8550342 ldur x2, [x26, #-176] 0x1bbb7c08348 28 eb2263ff cmp sp, x2 | 0x35ced5008088 28 eb2263ff cmp sp, x2 0x1bbb7c0834c 2c 54000149 b.ls #+0x28 (addr 0x1bbb7 | 0x35ced500808c 2c 54000149 b.ls #+0x28 (addr 0x35ce 0x1bbb7c08350 30 f8590340 ldur x0, [x26, #-112] | 0x35ced5008090 30 f8590340 ldur x0, [x26, #-112] 0x1bbb7c08354 34 f85e83a3 ldur x3, [fp, #-24] | 0x35ced5008094 34 f85e83a3 ldur x3, [fp, #-24] 0x1bbb7c08358 38 910003bf mov sp, fp | 0x35ced5008098 38 910003bf mov sp, fp 0x1bbb7c0835c 3c a8c17bfd ldp fp, lr, [sp], #16 | 0x35ced500809c 3c a8c17bfd ldp fp, lr, [sp], #16 0x1bbb7c08360 40 91000463 add x3, x3, #0x1 (1) | 0x35ced50080a0 40 91000463 add x3, x3, #0x1 (1) 0x1bbb7c08364 44 91000470 add x16, x3, #0x1 (1) | 0x35ced50080a4 44 91000470 add x16, x3, #0x1 (1) 0x1bbb7c08368 48 927ffa10 and x16, x16, #0xffffffff | 0x35ced50080a8 48 927ffa10 and x16, x16, #0xfffffff 0x1bbb7c0836c 4c 8b306fff add sp, sp, x16, lsl #3 | 0x35ced50080ac 4c 8b306fff add sp, sp, x16, lsl #3 0x1bbb7c08370 50 d65f03c0 ret | 0x35ced50080b0 50 d65f03c0 ret 0x1bbb7c08374 54 d2c00c02 movz x2, #0x6000000000 | 0x35ced50080b4 54 d2c00c02 movz x2, #0x6000000000 0x1bbb7c08378 58 d10043ff sub sp, sp, #0x10 (16) | 0x35ced50080b8 58 d10043ff sub sp, sp, #0x10 (16) 0x1bbb7c0837c 5c f90007ff str xzr, [sp, #8] | 0x35ced50080bc 5c f90007ff str xzr, [sp, #8] 0x1bbb7c08380 60 f90003e2 str x2, [sp] | 0x35ced50080c0 60 f90003e2 str x2, [sp] 0x1bbb7c08384 64 f9000bfb str cp, [sp, #16] | 0x35ced50080c4 64 f9000bfb str cp, [sp, #16] 0x1bbb7c08388 68 d2914281 movz x1, #0x8a14 | 0x35ced50080c8 68 d2914281 movz x1, #0x8a14 0x1bbb7c0838c 6c f2a01cc1 movk x1, #0xe6, lsl #16 | 0x35ced50080cc 6c f2a026e1 movk x1, #0x137, lsl #16 0x1bbb7c08390 70 f2c00021 movk x1, #0x1, lsl #32 | 0x35ced50080d0 70 f2c00021 movk x1, #0x1, lsl #32 0x1bbb7c08394 74 d2800020 movz x0, #0x1 | 0x35ced50080d4 74 d2800020 movz x0, #0x1 0x1bbb7c08398 78 aa1b03e2 mov x2, cp | 0x35ced50080d8 78 aa1b03e2 mov x2, cp 0x1bbb7c0839c 7c 580000fb ldr cp, pc+28 (addr 0x000 | 0x35ced50080dc 7c 580000fb ldr cp, pc+28 (addr 0x00 0x1bbb7c083a0 80 58000150 ldr x16, pc+40 (addr 0x00 | 0x35ced50080e0 80 58000150 ldr x16, pc+40 (addr 0x0 0x1bbb7c083a4 84 d63f0200 blr x16 | 0x35ced50080e4 84 d63f0200 blr x16 0x1bbb7c083a8 88 17ffffea b #-0x58 (addr 0x1bbb7c08 | 0x35ced50080e8 88 17ffffea b #-0x58 (addr 0x35ced50 0x1bbb7c083ac 8c d503201f nop | 0x35ced50080ec 8c d503201f nop 0x1bbb7c083b0 90 580000ff constant pool begin (num_ | 0x35ced50080f0 90 580000ff constant pool begin (num 0x1bbb7c083b4 94 d63f03e0 constant | 0x35ced50080f4 94 d63f03e0 constant 0x1bbb7c083b8 98 a07c1139 constant | 0x35ced50080f8 98 3de41139 constant 0x1bbb7c083bc 9c 000018f7 constant | 0x35ced50080fc 9c 00002d10 constant 0x1bbb7c083c0 a0 0108b6c0 constant | 0x35ced5008100 a0 0159b6c0 constant 0x1bbb7c083c4 a4 00000001 constant | 0x35ced5008104 a4 00000001 constant 0x1bbb7c083c8 a8 010eeaa0 constant | 0x35ced5008108 a8 015feaa0 constant 0x1bbb7c083cc ac 00000001 constant | 0x35ced500810c ac 00000001 constant 0x1bbb7c083d0 b0 f95b5f50 ldr x16, [x26, #14008] | 0x35ced5008110 b0 f95b5f50 ldr x16, [x26, #14008] 0x1bbb7c083d4 b4 d61f0200 br x16 | 0x35ced5008114 b4 d61f0200 br x16 0x1bbb7c083d8 b8 97fffffe bl #-0x8 (addr 0x1bbb7c08 | 0x35ced5008118 b8 97fffffe bl #-0x8 (addr 0x35ced50 ```

eqeq vs eqeqeq

for == vs === v8 jit saves the day again but like I said I expect my code to run on various engines which might not even have jit, so strict equality is guaranteed to be cheaper/faster

bytecode diff ```asm [generated bytecode for function: bytecode (0x35e0770b00c1 0x35e0770b0b6e @ 0 : 1b 02 LdaImmutab | 76 S> 0x1100257b0b6e @ 0 : 1b 02 LdaImmutab 0x35e0770b0b70 @ 2 : ac 00 ThrowRefer | 0x1100257b0b70 @ 2 : ac 00 ThrowRefer 0x35e0770b0b72 @ 4 : c6 Star0 | 0x1100257b0b72 @ 4 : c6 Star0 0x35e0770b0b73 @ 5 : 0c 01 LdaSmi [1] | 0x1100257b0b73 @ 5 : 0c 01 LdaSmi [1] 32 E> 0x35e0770b0b75 @ 7 : 68 03 00 TestEqual | 83 E> 0x1100257b0b75 @ 7 : 69 03 00 TestEqualS 0x35e0770b0b78 @ 10 : 52 LogicalNot | 0x1100257b0b78 @ 10 : 52 LogicalNot 0x35e0770b0b79 @ 11 : c5 Star1 | 0x1100257b0b79 @ 11 : c5 Star1 0x35e0770b0b7a @ 12 : 0c 01 LdaSmi [1] | 0x1100257b0b7a @ 12 : 0c 01 LdaSmi [1] 0x35e0770b0b7c @ 14 : c4 Star2 | 0x1100257b0b7c @ 14 : c4 Star2 0x35e0770b0b7d @ 15 : 0c 01 LdaSmi [1] | 0x1100257b0b7d @ 15 : 0c 01 LdaSmi [1] 40 E> 0x35e0770b0b7f @ 17 : 68 f8 01 TestEqual | 92 E> 0x1100257b0b7f @ 17 : 69 f8 01 TestEqualS 0x35e0770b0b82 @ 20 : 52 LogicalNot | 0x1100257b0b82 @ 20 : 52 LogicalNot 0x35e0770b0b83 @ 21 : c4 Star2 | 0x1100257b0b83 @ 21 : c4 Star2 0x35e0770b0b84 @ 22 : 0c 01 LdaSmi [1] | 0x1100257b0b84 @ 22 : 0c 01 LdaSmi [1] 0x35e0770b0b86 @ 24 : c3 Star3 | 0x1100257b0b86 @ 24 : c3 Star3 0x35e0770b0b87 @ 25 : 0c 01 LdaSmi [1] | 0x1100257b0b87 @ 25 : 0c 01 LdaSmi [1] 48 E> 0x35e0770b0b89 @ 27 : 68 f7 02 TestEqual | 101 E> 0x1100257b0b89 @ 27 : 69 f7 02 TestEqualS 0x35e0770b0b8c @ 30 : 52 LogicalNot | 0x1100257b0b8c @ 30 : 52 LogicalNot 0x35e0770b0b8d @ 31 : c3 Star3 | 0x1100257b0b8d @ 31 : c3 Star3 0x35e0770b0b8e @ 32 : 0c 01 LdaSmi [1] | 0x1100257b0b8e @ 32 : 0c 01 LdaSmi [1] 0x35e0770b0b90 @ 34 : c2 Star4 | 0x1100257b0b90 @ 34 : c2 Star4 0x35e0770b0b91 @ 35 : 0c 01 LdaSmi [1] | 0x1100257b0b91 @ 35 : 0c 01 LdaSmi [1] 56 E> 0x35e0770b0b93 @ 37 : 68 f6 03 TestEqual | 110 E> 0x1100257b0b93 @ 37 : 69 f6 03 TestEqualS 0x35e0770b0b96 @ 40 : 52 LogicalNot | 0x1100257b0b96 @ 40 : 52 LogicalNot 0x35e0770b0b97 @ 41 : c2 Star4 | 0x1100257b0b97 @ 41 : c2 Star4 0x35e0770b0b98 @ 42 : 0c 01 LdaSmi [1] | 0x1100257b0b98 @ 42 : 0c 01 LdaSmi [1] 0x35e0770b0b9a @ 44 : c1 Star5 | 0x1100257b0b9a @ 44 : c1 Star5 0x35e0770b0b9b @ 45 : 0c 01 LdaSmi [1] | 0x1100257b0b9b @ 45 : 0c 01 LdaSmi [1] 64 E> 0x35e0770b0b9d @ 47 : 68 f5 04 TestEqual | 119 E> 0x1100257b0b9d @ 47 : 69 f5 04 TestEqualS 0x35e0770b0ba0 @ 50 : 52 LogicalNot | 0x1100257b0ba0 @ 50 : 52 LogicalNot 0x35e0770b0ba1 @ 51 : c1 Star5 | 0x1100257b0ba1 @ 51 : c1 Star5 25 E> 0x35e0770b0ba2 @ 52 : 5c fa f9 05 05 CallUndefi | 76 E> 0x1100257b0ba2 @ 52 : 5c fa f9 05 05 CallUndefi 0x35e0770b0ba7 @ 57 : 0d LdaUndefin | 0x1100257b0ba7 @ 57 : 0d LdaUndefin 127 S> 0x35e0770b0ba8 @ 58 : ab Return | 127 S> 0x1100257b0ba8 @ 58 : ab Return Constant pool (size = 1) Constant pool (size = 1) 0x35e0770b0b21: [FixedArray] in OldSpace | 0x1100257b0b21: [FixedArray] in OldSpace - map: 0x04d9c51412c1 | - map: 0x3a310da012c1 - length: 1 - length: 1 0: 0x353e93f7ee99 | 0: 0x13127f97ee99 Handler Table (size = 0) Handler Table (size = 0) Source Position Table (size = 20) | Source Position Table (size = 21) 0x35e0770b0bb1 | 0x1100257b0bb1 --- Raw source --- --- Raw source --- (v) { (v) { noop(v != 1, 1 != 1, 1 != 1, 1 != 1, 1 != 1); | // noop(v != 1, 1 != 1, 1 != 1, 1 != 1, 1 != 1); // noop(v !== 1, 1 !== 1, 1 !== 1, 1 !== 1, 1 !== 1); | noop(v !== 1, 1 !== 1, 1 !== 1, 1 !== 1, 1 !== 1); } } --- Optimized code --- --- Optimized code --- optimization_id = 0 optimization_id = 0 source_position = 17 source_position = 17 kind = TURBOFAN kind = TURBOFAN name = bytecode name = bytecode stack_slots = 6 stack_slots = 6 compiler = turbofan compiler = turbofan address = 0x203343c88001 | address = 0xe45f8488001 Instructions (size = 224) Instructions (size = 224) 0x203343c88060 0 f85c0050 ldur x16, [x2, #-64] | 0xe45f8488060 0 f85c0050 ldur x16, [x2, #-64] 0x203343c88064 4 b840f210 ldur w16, [x16, #15] | 0xe45f8488064 4 b840f210 ldur w16, [x16, #15] 0x203343c88068 8 36000070 tbz w16, #0, #+0xc (addr | 0xe45f8488068 8 36000070 tbz w16, #0, #+0xc (addr 0x203343c8806c c 58000571 ldr x17, pc+172 (addr 0x | 0xe45f848806c c 58000571 ldr x17, pc+172 (addr 0x0 0x203343c88070 10 d61f0220 br x17 | 0xe45f8488070 10 d61f0220 br x17 0x203343c88074 14 a9bf7bfd stp fp, lr, [sp, #-16]! | 0xe45f8488074 14 a9bf7bfd stp fp, lr, [sp, #-16]! 0x203343c88078 18 910003fd mov fp, sp | 0xe45f8488078 18 910003fd mov fp, sp 0x203343c8807c 1c a9be03ff stp xzr, x0, [sp, #-32]! | 0xe45f848807c 1c a9be03ff stp xzr, x0, [sp, #-32]! 0x203343c88080 20 a9016fe1 stp x1, cp, [sp, #16] | 0xe45f8488080 20 a9016fe1 stp x1, cp, [sp, #16] 0x203343c88084 24 f8550342 ldur x2, [x26, #-176] | 0xe45f8488084 24 f8550342 ldur x2, [x26, #-176] 0x203343c88088 28 f90003fb str cp, [sp] | 0xe45f8488088 28 f90003fb str cp, [sp] 0x203343c8808c 2c eb2263ff cmp sp, x2 | 0xe45f848808c 2c eb2263ff cmp sp, x2 0x203343c88090 30 54000209 b.ls #+0x40 (addr 0x2033 | 0xe45f8488090 30 54000209 b.ls #+0x40 (addr 0xe45f8 0x203343c88094 34 f9401fe2 ldr x2, [sp, #56] | 0xe45f8488094 34 f9401fe2 ldr x2, [sp, #56] 0x203343c88098 38 7200005f tst w2, #0x1 | 0xe45f8488098 38 7200005f tst w2, #0x1 0x203343c8809c 3c 540004e1 b.ne #+0x9c (addr 0x2033 | 0xe45f848809c 3c 540004e1 b.ne #+0x9c (addr 0xe45f8 0x203343c880a0 40 f8590340 ldur x0, [x26, #-112] | 0xe45f84880a0 40 f8590340 ldur x0, [x26, #-112] 0x203343c880a4 44 f85e83a3 ldur x3, [fp, #-24] | 0xe45f84880a4 44 f85e83a3 ldur x3, [fp, #-24] 0x203343c880a8 48 910003bf mov sp, fp | 0xe45f84880a8 48 910003bf mov sp, fp 0x203343c880ac 4c a8c17bfd ldp fp, lr, [sp], #16 | 0xe45f84880ac 4c a8c17bfd ldp fp, lr, [sp], #16 0x203343c880b0 50 91000463 add x3, x3, #0x1 (1) | 0xe45f84880b0 50 91000463 add x3, x3, #0x1 (1) 0x203343c880b4 54 f100087f cmp x3, #0x2 (2) | 0xe45f84880b4 54 f100087f cmp x3, #0x2 (2) 0x203343c880b8 58 5400004a b.ge #+0x8 (addr 0x20334 | 0xe45f84880b8 58 5400004a b.ge #+0x8 (addr 0xe45f84 0x203343c880bc 5c d2800043 movz x3, #0x2 | 0xe45f84880bc 5c d2800043 movz x3, #0x2 0x203343c880c0 60 91000470 add x16, x3, #0x1 (1) | 0xe45f84880c0 60 91000470 add x16, x3, #0x1 (1) 0x203343c880c4 64 927ffa10 and x16, x16, #0xfffffff | 0xe45f84880c4 64 927ffa10 and x16, x16, #0xffffffff 0x203343c880c8 68 8b306fff add sp, sp, x16, lsl #3 | 0xe45f84880c8 68 8b306fff add sp, sp, x16, lsl #3 0x203343c880cc 6c d65f03c0 ret | 0xe45f84880cc 6c d65f03c0 ret 0x203343c880d0 70 d2c00c02 movz x2, #0x6000000000 | 0xe45f84880d0 70 d2c00c02 movz x2, #0x6000000000 0x203343c880d4 74 d10043ff sub sp, sp, #0x10 (16) | 0xe45f84880d4 74 d10043ff sub sp, sp, #0x10 (16) 0x203343c880d8 78 f90007ff str xzr, [sp, #8] | 0xe45f84880d8 78 f90007ff str xzr, [sp, #8] 0x203343c880dc 7c f90003e2 str x2, [sp] | 0xe45f84880dc 7c f90003e2 str x2, [sp] 0x203343c880e0 80 d2814281 movz x1, #0xa14 | 0xe45f84880e0 80 d2914281 movz x1, #0x8a14 0x203343c880e4 84 f2a06321 movk x1, #0x319, lsl #16 | 0xe45f84880e4 84 f2a0aac1 movk x1, #0x556, lsl #16 0x203343c880e8 88 f2c00021 movk x1, #0x1, lsl #32 | 0xe45f84880e8 88 f2c00021 movk x1, #0x1, lsl #32 0x203343c880ec 8c d2800020 movz x0, #0x1 | 0xe45f84880ec 8c d2800020 movz x0, #0x1 0x203343c880f0 90 5800011b ldr cp, pc+32 (addr 0x00 | 0xe45f84880f0 90 5800011b ldr cp, pc+32 (addr 0x000 0x203343c880f4 94 58000170 ldr x16, pc+44 (addr 0x0 | 0xe45f84880f4 94 58000170 ldr x16, pc+44 (addr 0x00 0x203343c880f8 98 d63f0200 blr x16 | 0xe45f84880f8 98 d63f0200 blr x16 0x203343c880fc 9c 17ffffe6 b #-0x68 (addr 0x203343c | 0xe45f84880fc 9c 17ffffe6 b #-0x68 (addr 0xe45f8488 0x203343c88100 a0 d503201f nop | 0xe45f8488100 a0 d503201f nop 0x203343c88104 a4 5800011f constant pool begin (num | 0xe45f8488104 a4 5800011f constant pool begin (num_ 0x203343c88108 a8 d63f03e0 constant | 0xe45f8488108 a8 d63f03e0 constant 0x203343c8810c ac d503201f constant | 0xe45f848810c ac d503201f constant 0x203343c88110 b0 fd5c1139 constant | 0xe45f8488110 b0 549c1139 constant 0x203343c88114 b4 000022c9 constant | 0xe45f8488114 b4 00001adc constant 0x203343c88118 b8 033b36c0 constant | 0xe45f8488118 b8 0578b6c0 constant 0x203343c8811c bc 00000001 constant | 0xe45f848811c bc 00000001 constant 0x203343c88120 c0 03416aa0 constant | 0xe45f8488120 c0 057eeaa0 constant 0x203343c88124 c4 00000001 constant | 0xe45f8488124 c4 00000001 constant 0x203343c88128 c8 f95b5350 ldr x16, [x26, #13984] | 0xe45f8488128 c8 f95b5350 ldr x16, [x26, #13984] 0x203343c8812c cc d61f0200 br x16 | 0xe45f848812c cc d61f0200 br x16 0x203343c88130 d0 f95b5f50 ldr x16, [x26, #14008] | 0xe45f8488130 d0 f95b5f50 ldr x16, [x26, #14008] 0x203343c88134 d4 d61f0200 br x16 | 0xe45f8488134 d4 d61f0200 br x16 0x203343c88138 d8 97fffffc bl #-0x10 (addr 0x203343 | 0xe45f8488138 d8 97fffffc bl #-0x10 (addr 0xe45f848 0x203343c8813c dc 97fffffd bl #-0xc (addr 0x203343c | 0xe45f848813c dc 97fffffd bl #-0xc (addr 0xe45f8488 ```

if jit can't guess types or if for any reason decides to not give fuck then != will result in more cycles than !==

bytecode diff ```asm [generated bytecode for function: bytecode (0x21d99bf700c1 0x21d99bf70b6e @ 0 : 1b 02 LdaImmutab | 44 S> 0x101315c30b6e @ 0 : 1b 02 LdaImmutab 0x21d99bf70b70 @ 2 : ac 00 ThrowRefer | 0x101315c30b70 @ 2 : ac 00 ThrowRefer 0x21d99bf70b72 @ 4 : c6 Star0 | 0x101315c30b72 @ 4 : c6 Star0 0x21d99bf70b73 @ 5 : 0c 01 LdaSmi [1] | 0x101315c30b73 @ 5 : 0c 01 LdaSmi [1] 32 E> 0x21d99bf70b75 @ 7 : 68 03 00 TestEqual | 51 E> 0x101315c30b75 @ 7 : 69 03 00 TestEqualS 0x21d99bf70b78 @ 10 : 52 LogicalNot | 0x101315c30b78 @ 10 : 52 LogicalNot 0x21d99bf70b79 @ 11 : c5 Star1 | 0x101315c30b79 @ 11 : c5 Star1 25 E> 0x21d99bf70b7a @ 12 : 5e fa f9 01 CallUndefi | 44 E> 0x101315c30b7a @ 12 : 5e fa f9 01 CallUndefi 0x21d99bf70b7e @ 16 : 0d LdaUndefin | 0x101315c30b7e @ 16 : 0d LdaUndefin 59 S> 0x21d99bf70b7f @ 17 : ab Return | 59 S> 0x101315c30b7f @ 17 : ab Return Constant pool (size = 1) Constant pool (size = 1) 0x21d99bf70b21: [FixedArray] in OldSpace | 0x101315c30b21: [FixedArray] in OldSpace - map: 0x06e45cf412c1 | - map: 0x2f299dc812c1 - length: 1 - length: 1 0: 0x3cd13837ee99 | 0: 0x0c897a13ee99 Handler Table (size = 0) Handler Table (size = 0) Source Position Table (size = 11) | Source Position Table (size = 10) 0x21d99bf70b81 | 0x101315c30b81 --- Raw source --- --- Raw source --- (v) { (v) { noop(v != 1); | // noop(v != 1); // noop(v !== 1); | noop(v !== 1); } } --- Optimized code --- --- Optimized code --- optimization_id = 0 optimization_id = 0 source_position = 17 source_position = 17 kind = TURBOFAN kind = TURBOFAN name = bytecode name = bytecode stack_slots = 6 stack_slots = 6 compiler = turbofan compiler = turbofan address = 0xff594c882c1 | address = 0x3e6f2fe482c1 Instructions (size = 248) | Instructions (size = 204) 0xff594c88320 0 f85c0050 ldur x16, [x2, #-64] | 0x3e6f2fe48320 0 f85c0050 ldur x16, [x2, #-64] 0xff594c88324 4 b840f210 ldur w16, [x16, #15] | 0x3e6f2fe48324 4 b840f210 ldur w16, [x16, #15] 0xff594c88328 8 36000070 tbz w16, #0, #+0xc (addr | 0x3e6f2fe48328 8 36000070 tbz w16, #0, #+0xc (addr 0xff594c8832c c 58000631 ldr x17, pc+196 (addr 0x0 | 0x3e6f2fe4832c c 58000531 ldr x17, pc+164 (addr 0x 0xff594c88330 10 d61f0220 br x17 | 0x3e6f2fe48330 10 d61f0220 br x17 0xff594c88334 14 a9bf7bfd stp fp, lr, [sp, #-16]! | 0x3e6f2fe48334 14 a9bf7bfd stp fp, lr, [sp, #-16]! 0xff594c88338 18 910003fd mov fp, sp | 0x3e6f2fe48338 18 910003fd mov fp, sp 0xff594c8833c 1c a9be03ff stp xzr, x0, [sp, #-32]! | 0x3e6f2fe4833c 1c a9be03ff stp xzr, x0, [sp, #-32]! 0xff594c88340 20 a9016fe1 stp x1, cp, [sp, #16] | 0x3e6f2fe48340 20 a9016fe1 stp x1, cp, [sp, #16] 0xff594c88344 24 f8550344 ldur x4, [x26, #-176] | 0x3e6f2fe48344 24 f8550342 ldur x2, [x26, #-176] 0xff594c88348 28 f90003fb str cp, [sp] | 0x3e6f2fe48348 28 eb2263ff cmp sp, x2 0xff594c8834c 2c eb2463ff cmp sp, x4 | 0x3e6f2fe4834c 2c 540001a9 b.ls #+0x34 (addr 0x3e6f 0xff594c88350 30 54000289 b.ls #+0x50 (addr 0xff594 | 0x3e6f2fe48350 30 f8590340 ldur x0, [x26, #-112] 0xff594c88354 34 d2c00021 movz x1, #0x100000000 | 0x3e6f2fe48354 34 f85e83a3 ldur x3, [fp, #-24] 0xff594c88358 38 d2800002 movz x2, #0x0 | 0x3e6f2fe48358 38 910003bf mov sp, fp 0xff594c8835c 3c 58000423 ldr x3, pc+132 (addr 0x00 | 0x3e6f2fe4835c 3c a8c17bfd ldp fp, lr, [sp], #16 0xff594c88360 40 f9401fe0 ldr x0, [sp, #56] | 0x3e6f2fe48360 40 91000463 add x3, x3, #0x1 (1) 0xff594c88364 44 5800043b ldr cp, pc+132 (addr 0x00 | 0x3e6f2fe48364 44 f100087f cmp x3, #0x2 (2) 0xff594c88368 48 58000490 ldr x16, pc+144 (addr 0x0 | 0x3e6f2fe48368 48 5400004a b.ge #+0x8 (addr 0x3e6f2 0xff594c8836c 4c d63f0200 blr x16 | 0x3e6f2fe4836c 4c d2800043 movz x3, #0x2 0xff594c88370 50 f8590340 ldur x0, [x26, #-112] | 0x3e6f2fe48370 50 91000470 add x16, x3, #0x1 (1) 0xff594c88374 54 f85e83a3 ldur x3, [fp, #-24] | 0x3e6f2fe48374 54 927ffa10 and x16, x16, #0xfffffff 0xff594c88378 58 910003bf mov sp, fp | 0x3e6f2fe48378 58 8b306fff add sp, sp, x16, lsl #3 0xff594c8837c 5c a8c17bfd ldp fp, lr, [sp], #16 | 0x3e6f2fe4837c 5c d65f03c0 ret 0xff594c88380 60 91000463 add x3, x3, #0x1 (1) | 0x3e6f2fe48380 60 d2c00802 movz x2, #0x4000000000 0xff594c88384 64 f100087f cmp x3, #0x2 (2) | 0x3e6f2fe48384 64 d10043ff sub sp, sp, #0x10 (16) 0xff594c88388 68 5400004a b.ge #+0x8 (addr 0xff594c | 0x3e6f2fe48388 68 f90007ff str xzr, [sp, #8] 0xff594c8838c 6c d2800043 movz x3, #0x2 | 0x3e6f2fe4838c 6c f90003e2 str x2, [sp] 0xff594c88390 70 91000470 add x16, x3, #0x1 (1) | 0x3e6f2fe48390 70 f9000bfb str cp, [sp, #16] 0xff594c88394 74 927ffa10 and x16, x16, #0xffffffff | 0x3e6f2fe48394 74 d2994281 movz x1, #0xca14 0xff594c88398 78 8b306fff add sp, sp, x16, lsl #3 | 0x3e6f2fe48398 78 f2a06181 movk x1, #0x30c, lsl #16 0xff594c8839c 7c d65f03c0 ret | 0x3e6f2fe4839c 7c f2c00021 movk x1, #0x1, lsl #32 0xff594c883a0 80 d2c00804 movz x4, #0x4000000000 | 0x3e6f2fe483a0 80 d2800020 movz x0, #0x1 0xff594c883a4 84 d10043ff sub sp, sp, #0x10 (16) | 0x3e6f2fe483a4 84 aa1b03e2 mov x2, cp 0xff594c883a8 88 f90007ff str xzr, [sp, #8] | 0x3e6f2fe483a8 88 5800011b ldr cp, pc+32 (addr 0x00 0xff594c883ac 8c f90003e4 str x4, [sp] | 0x3e6f2fe483ac 8c 58000170 ldr x16, pc+44 (addr 0x0 0xff594c883b0 90 580001c4 ldr x4, pc+56 (addr 0x000 | 0x3e6f2fe483b0 90 d63f0200 blr x16 0xff594c883b4 94 d2994281 movz x1, #0xca14 | 0x3e6f2fe483b4 94 17ffffe7 b #-0x64 (addr 0x3e6f2fe 0xff594c883b8 98 f2a05c41 movk x1, #0x2e2, lsl #16 | 0x3e6f2fe483b8 98 d503201f nop 0xff594c883bc 9c f2c00021 movk x1, #0x1, lsl #32 | 0x3e6f2fe483bc 9c 5800011f constant pool begin (num 0xff594c883c0 a0 d2800020 movz x0, #0x1 | 0x3e6f2fe483c0 a0 d63f03e0 constant 0xff594c883c4 a4 aa0403fb mov cp, x4 | 0x3e6f2fe483c4 a4 d503201f constant 0xff594c883c8 a8 580001d0 ldr x16, pc+56 (addr 0x00 | 0x3e6f2fe483c8 a8 52d81139 constant 0xff594c883cc ac d63f0200 blr x16 | 0x3e6f2fe483cc ac 0000161e constant 0xff594c883d0 b0 17ffffe1 b #-0x7c (addr 0xff594c88 | 0x3e6f2fe483d0 b0 032ef6c0 constant 0xff594c883d4 b4 d503201f nop | 0x3e6f2fe483d4 b4 00000001 constant 0xff594c883d8 b8 5800017f constant pool begin (num_ | 0x3e6f2fe483d8 b8 03352aa0 constant 0xff594c883dc bc d63f03e0 constant | 0x3e6f2fe483dc bc 00000001 constant 0xff594c883e0 c0 9bf70c79 constant | 0x3e6f2fe483e0 c0 f95b5f50 ldr x16, [x26, #14008] 0xff594c883e4 c4 000021d9 constant | 0x3e6f2fe483e4 c4 d61f0200 br x16 0xff594c883e8 c8 6d541139 constant | 0x3e6f2fe483e8 c8 97fffffe bl #-0x8 (addr 0x3e6f2fe 0xff594c883ec cc 00000c64 constant < 0xff594c883f0 d0 0304f6c0 constant < 0xff594c883f4 d4 00000001 constant < 0xff594c883f8 d8 03097700 constant < 0xff594c883fc dc 00000001 constant < 0xff594c88400 e0 030b2aa0 constant < 0xff594c88404 e4 00000001 constant < 0xff594c88408 e8 f95b5f50 ldr x16, [x26, #14008] < 0xff594c8840c ec d61f0200 br x16 < 0xff594c88410 f0 97fffffe bl #-0x8 (addr 0xff594c88 < 0xff594c88414 f4 97fffffd bl #-0xc (addr 0xff594c88 < < < < Source positions: Source positions: pc offset position pc offset position 0 17 0 17 34 32 | 30 59 50 59 | 60 17 80 17 < Inlined functions (count = 1) Inlined functions (count = 1) 0x21d99bf70111 | 0x101315c30111 Deoptimization Input Data (deopt points = 2) | Deoptimization Input Data (deopt points = 1) index bytecode-offset pc index bytecode-offset pc 0 7 50 | 0 -1 94 1 -1 b0 | > Safepoints (size = 21) > 0x3e6f2fe483b4 94 c8 100000 (sp -> fp) 0 Safepoints (size = 34) | RelocInfo (size = 30) 0xff594c88370 50 f0 100000 (sp -> fp) 0 | 0x3e6f2fe483a8 full embedded object (0x161e52d81139 fp) 1 | 0x3e6f2fe483bc constant pool (size 36) | 0x3e6f2fe483e8 deopt script offset (17) RelocInfo (size = 52) | 0x3e6f2fe483e8 deopt inlining id (-1) 0xff594c8835c full embedded object (0x21d99bf70c79

btw cool project you got here

101arrowz commented 3 years ago

fflate is pretty heavily biased to V8 but it's true that other environments will have different/worse optimizations, so maybe those changes are valid. I had previously used bit shifts for bit to byte, but now I don't because it fails for >512MB files (obvious reasons). I think I'll investigate manually in other browser engines (mainly JSC and SpiderMonkey) where triple eq is better.

Thanks for the info (and effort).