WebAssembly / simd

Branch of the spec repo scoped to discussion of SIMD in WebAssembly
Other
527 stars 43 forks source link

Consider a more efficient encoding for v128.const 0 #521

Closed ngzhian closed 2 years ago

ngzhian commented 2 years ago

v128.const 0 is the most common v128.const operation I see in some benchmarks (>10x more frequently than other v128.const constants). It is currently 18 bytes (16 of which are 0), perhaps a more efficient want of encoding this (a special instruction) should be considered (for a subsequent proposal of course)?

steven-johnson commented 2 years ago

Would it be more efficient to just (say) i8x16.splat 0 ?

EDIT: oops, I guess there is no immediate form of the splat instructions.

ngzhian commented 2 years ago

Yea it would be more efficient (code size) to say:

i32.const 0 (2 bytes)
i8x16.splat (2 bytes)

However, the toolchain will mostly emit a v128.const 0 for that (cc @tlively). I think maybe a special case for splat (of any shape) of constant 0, to emit the splat instead of v128.const, will be nice. That said, binary size isn't a huge problem for now (haven't gotten reports about it yet!) but just filing this to track :)

penzn commented 2 years ago

I think that is the easiest fix :) We can possibly consider using variable integer encoding of up to 128 bits, though that can be slightly awkward.

Maratyszcza commented 2 years ago

Related: #255

tlively commented 2 years ago

It would be good to figure out what a typical percentage of code size this would save. If there are situations in which splats (or other patterns) would be faster than v128.const, that would be especially good to know about, because that would be a clear win.

Maratyszcza commented 2 years ago

splats would never be faster than v128.const, because v128.const guarantees the literal to be static, while splat does not.

penzn commented 2 years ago

We can use the trusty LEB encoding which we already use in other places.

ngzhian commented 2 years ago

say we have a 2 byte encoding, in a release-built Wasm file that is 4082107 bytes (with 582 v128.const instructions, 297 of which are with const 0), we can save (18 - 2) * 297 = 4752 bytes (.1%).

The gzip version of the file is 2077721 bytes, I locally replaced the v128.const 0 with 0x7b7b, the resulting gzip version is 2077463 (diff of 258 bytes) (0.01%).

s=open('release.wasm','rb').read()
t=s.replace(b'\xfd\x0c' + b'\x00'*16, b'\x7b\x7b')
open('release-new.wasm','wb').write(t)
$ gzip -9 -c release-new.wasm > release-new.wasm.gz
$ du -b release*
4077355 release-new.wasm
2077463 release-new.wasm.gz
4082107 release.wasm
2077721 release.wasm.gz

So it turns out, not a lot of savings! Streams of 0 compress well, plus v128.const don't show up as much. Sad at this result, but good to have some rough numbers here.

ngzhian commented 2 years ago

I'm going to close this out, seems not that useful, we can follow-up in #255 anyway, thanks all for the comments!

tlively commented 2 years ago

Thanks for putting in the leg work to investigate that, @ngzhian!