Open pmatyja opened 1 year ago
How well is saturating arithmetic supported by various ISAs?
Unfortunately I'm not able to provide reliably such detailed data across multiple ISAs. I think there is some support in MMX and ARM v6+ and E variants of ARMv5 but my knowledge in this area is limited. That's why I was a bit reluctant for a such proposal.
On the other hand it is difficult to produce WASM code that can run fast on any platform. Engines on the other hand knowing the underlaying hardware can exploit some behaviors and/or other hardware specific instructions to produce more performant code.
So maybe in the end dedicated instructions for such (probably "underused") but atomic operations which can add up very quickly can result in significant performance boost even if emulated in WASM engine.
Abstract
Fundamental idea is to add new dedicated instructions for saturating arithmetic for all basic types (single input / single output) to prevent each module invent their own saturated logic algorithms and to allow the WebAssembly engines to use modern hardware instructions for such cases.
Relation to other extension
Although the SIMD extension exists that allows the use of saturated logic it may not be used in all general use cases (especially single input / single output scenarios).
Problem
Currently WebAssembly modules have no dedicated instructions for saturated arithmetic and forces all users to create their own algorithms inside the modules which when heavy used will greatly bloat the module size and/or reduce the performance especially on hardware where such hardware instructions do exists.
Proposal
Add dedicated saturation arithmetic instructions for all basic types (signed, unsigned) and if possible all type range limits (1 byte, 2 bytes, 4 bytes, 8 bytes).
Rational
"overflow" and "underflow" are a cause of many software problems that could be avoided and having a dedicated instruction would help to ensure safety in such cases. Even though this behavior could be emulated in code existence of such dedicated instructions could help engine implementors to optimize the generated code and benefit from existing hardware instructions.
Considerations
Floating-point operations that result in "infinity" and "negative infinity" can be considered as saturated logic. I do not know if there is dedicated hardware for saturated arithmetic that would result in "float min/max" values instead as well as if there is a need for such use cases and how useful they would be. I think it should be at least considered if this is possible because any operation on "+/- infinity" will result in "infinity" where "float min/max" values would act normally. But here people with more knowledge and experience would have to step in.