ewasm / design

Ewasm Design Overview and Specification
Apache License 2.0
1.02k stars 125 forks source link

Equivalence, invertable transformation, and size optimization of Wasm binaries #184

Open poemm opened 5 years ago

poemm commented 5 years ago

Implementations may wish to transform Wasm binaries to a form which is convenient (metered form, instantiated form, etc). If such transformations were invertible, then the original binary could be recovered and not stored.

There is an equivalence of Wasm binaries up to the LEB 128 encoding of integers. Thankfully, each integer seems to have a unique shortest LEB 128 encoding which is trivial to encode/decode. So this is a proposal to require integers to be in their shortest LEB 128 form. Shortest form is a major size optimization for Wasm binaries.

Another equivalence is encoding of local variable types. If consecutive local variables are of the same type, then a multiplicity notation can be used. So this is a proposal to require using this multiplicity notation whenever possible. Using multiplicity notation is a size optimization for Wasm binaries.

The above proposed requirements are already useful as size optimizations, but also allow more invertible transformations of code sections, which are often the longest sections.

A curious mind may notice that if one proves that they know all equivalences, then perhaps they can define a canonical form for Wasm binaries. Such a canonical form may be very useful. Please let me know if anyone is interested in this.

cdetrio commented 5 years ago

see also https://github.com/ewasm/design/issues/97

poemm commented 5 years ago

Another equivalence is removal of opcodes after br, br_table, unreachable, or return. Anything after these opcodes will never execute, so it can be removed without changing behaviour of the Wasm module.

lrettig commented 5 years ago

Could these transformations/compressions be run by chisel?

poemm commented 5 years ago

Could these transformations/compressions be run by chisel?

Yes.