Open abrown opened 1 year ago
cc: @cfallin, @jameysharp, @elliottt
Here's another dimension to this issue: even when the sinkable loads are used, they are used very infrequently. Why is that?
E.g., here:
line count
12895 7.98k let v38 = C::unpack_value_array_2(ctx, v37);
12896 7.98k let v58 = &C::sinkable_load(ctx, v38.0);
12897 7.98k if let Some(v59) = v58 {
12898 2 let v1045 = constructor_put_in_xmm(ctx, v38.1);
12899 2 let v1056 = &constructor_sink_load_to_xmm_mem(ctx, v59);
12900 2 let v1081 = constructor_x64_mulss(ctx, v1045, v1056);
12901 2 let v1082 = constructor_output_xmm(ctx, v1081);
12902 2 // Rule at src/isa/x64/lower.isle line 2189.
12903 2 return Some(v1082);
12904 7.97k }
The sinkable load is only used a very low percentage of times.
As for the frequency issue, I think it may be because it's just so rare to generate the precise shape of load-then-op. That's a pretty wishy-washy excuse though and I wouldn't be confident in it.
As for lack of sinkable loads in some blocks this is one example which is f32x4.add
doesn't do anything with sinkable loads. The conditions for this are:
fadd
plus a loadEven with all that in place it looks like f32x4.add
alone was only translated 101 times meaning that if we go from the previous 2/7980 hit rate then we basically need to drastically increase the f32x4.add
frequence to get it to hit the sinkable load path.
One aspect here I think is that SIMD is conditionally enabled, whereas f32.add
is always enabled, which may end up causing simd to be a bit rarer.
@alexcrichton sent me a coverage report for the ISLE-generated lowerings for x64. I noticed by searching for "sinkable_load" that various operations (FP?) are not using the case where the load is sunk. In the Wasmtime meeting, @alexcrichton pointed out that the fact that we don't know whether the address is aligned could be preventing load-sinking here, but recall (here) that VEX- and EVEX-encoded instructions do not have the alignment restriction. Since @alexcrichton has been adding VEX-encoded versions of many instructions, I would think that we should be able to load-sink in that case (i.e., "can load sink MINUS alignment restriction" AND "have AVX").
I don't know what to do with this issue exactly, but @fitzgen thought I should record it as one. As mentioned in the Wasmtime meeting, it would be great to have better ways to find coverage gaps. This is at least one x64 lowering gap that I noticed (there are likely others) so here's an issue!