Perform NaN randomization as part of the interpretation of float bits

sunfishcode commented 7 months ago

This removes the THE_HOST_WANTS_TO option, which appeared to give hosts permission to interpret NaN bits how they "want to". The change here is to say that all core-wasm NaN bitpatterns are interpreted as the same component-model NaN value.

It's my understanding that the random_nan_bits function isn't meant to be the precise algorithm that nondeterministic-profile implementations must use, so this doesn't require hosts to do any new randomization work.

This change also fixes what appears to be a bug: lift_flat_variant/lower_flat_variant were calling reinterpret_i32_as_float/reinterpret_float_as_i32 without performing NaN scrambling. By making NaN scrambling be part of interpretation, we ensure that it's performed anywhere interpretation is performed.

lukewagner commented 7 months ago

Good catch on the missing maybe_scramble_nan* in the flat variant cases, we should fix that. However, I don't think this PR causes "all core-wasm NaN bitpatterns are interpreted as the same component-model NaN value" since it still produces semantically distinct NaN values upon lifting and lowering. Moreover, although your change is technically equivalent (saying you can non-deterministically choose to non-deterministically scramble NaN payloads is the same as saying that you can non-deterministically scramble NaN values), I think it's useful to make it clear to the reader that the host is allowed by the spec to simply propagate the NaN bitpattern; this just avoids confusion, so I'd rather keep THE_HOST_WANTS_TO.

sunfishcode commented 6 months ago

However, I don't think this PR causes "all core-wasm NaN bitpatterns are interpreted as the same component-model NaN value" since it still produces semantically distinct NaN values upon lifting and lowering.

I've now changed the PR to canonicalize on lifting, and scramble on lowering. This better illustrates how "all core-wasm NaN bitpatterns are interpreted as the same component-model NaN value".

I think it's useful to make it clear to the reader that the host is allowed by the spec to simply propagate the NaN bitpattern; this just avoids confusion, so I'd rather keep THE_HOST_WANTS_TO.

It's my understanding that your goal here is to avoid confusion among hosts about "do I need to randomize?"

My goal here is to avoid the confusion among hosts about "is it ok for me to interpret the bits, if I «want to»?" If they do that, they break virtualization in a subtle way.

Also, if we can say there's only one component-model NaN value, that would make it clear why wave doesn't need nan payload syntax.

I've also now added comments to this PR to call out how hosts can avoid skip canonicalization and randomization overhead entirely in practice.

lukewagner commented 6 months ago

Thanks again for thinking through this!

WebAssembly / component-model

Perform NaN randomization as part of the interpretation of float bits #279