Closed sDIMMaX closed 2 months ago
Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.
Author: | sDIMMaX |
---|---|
Assignees: | - |
Labels: | `area-System.Numerics`, `untriaged` |
Milestone: | - |
Looks somehow related to #96939.
Normalize is lowered to /Sqrt(Dot)
. The result looks like element at index 1 incorrectly remains 0.
Celeron E3300 is Wolfdale microarch which supports just SSE4.1. This could be another issue for different instruction set level.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Author: | sDIMMaX |
---|---|
Assignees: | - |
Labels: | `area-CodeGen-coreclr`, `untriaged` |
Milestone: | - |
The issue does reproduce with DOTNET_EnableSSE41=0
. Does not reproduce with DOTNET_EnableSSE42=0
.
So there should still be issue around pre-SSE4.1 codegen.
Jit make unsupported sse4.2 instructions?
When JIT is avoiding SSE4.1, it generates wrong code. The wrong SSE4.1-less code also produces wrong result on modern CPU.
Usage of unsupported instruction will probably cause hard crash of the process.
v1 / (float)Math.Sqrt(Vector2.Dot(v1,v1));
and Vector2.Dot(v1,v1);
works fine
Diff between working (left) codegen from manual impl and broken (right) from the intrinsic.
FYI @dotnet/jit-contrib @tannergooding
I just had the same issue. In case it helps, here are the results of my investigation:
I encountered the issue on a virtualized machine (kvm64 cpu profile on a physical Xeon(R) Gold 6226R) that only reports support up to SSE 3. I get <x' ∞>
as a result here, x seems to be correct depending on input.
Code:
using System.Numerics;
var vector = new Vector2(1, 1);
vector = Vector2.Normalize(vector);
Console.WriteLine(vector.ToString());
The generated ASM is the following:
movsd xmm0, qword ptr [reloc @RWD00]
movsd qword ptr [rbp-0x08], xmm0
movsd xmm0, qword ptr [rbp-0x08]
movsd xmm1, qword ptr [rbp-0x08]
movsd xmm2, qword ptr [rbp-0x08]
mulps xmm1, xmm2
movsd qword ptr [rbp-0x10], xmm1
movsd xmm1, qword ptr [rbp-0x10]
movsd xmm2, qword ptr [rbp-0x10]
haddps xmm1, xmm2
sqrtps xmm1, xmm1
divps xmm0, xmm1
movsd qword ptr [rbp-0x08], xmm0
lea rcx, [rbp-0x08]
call [System.Numerics.Vector2:ToString():System.String:this]
mov rcx, rax
call [System.Console:WriteLine(System.String)]
nop
On my machine (i7-10710U, with SSE4.1 and SSE4.2) with DOTNET_EnableSSE41=0
the same code is generated but results in <x' 8>. I have no clue whatsoever why different results are produced.
With DOTNET_EnableSSE41=1
, the following code is generated using dpps
(SSE 4.1) and the result is correct:
movsd xmm0, qword ptr [reloc @RWD00]
movsd qword ptr [rbp-0x08], xmm0
movsd xmm0, qword ptr [rbp-0x08]
movsd xmm1, qword ptr [rbp-0x08]
movsd xmm2, qword ptr [rbp-0x08]
+ dpps xmm1, xmm2, 63
- mulps xmm1, xmm2
- movsd qword ptr [rbp-0x10], xmm1
- movsd xmm1, qword ptr [rbp-0x10]
- movsd xmm2, qword ptr [rbp-0x10]
- haddps xmm1, xmm2
sqrtps xmm1, xmm1
divps xmm0, xmm1
movsd qword ptr [rbp-0x08], xmm0
lea rcx, [rbp-0x08]
call [System.Numerics.Vector2:ToString():System.String:this]
mov rcx, rax
call [System.Console:WriteLine(System.String)]
nop
Both machines have SDK 8.2.204 and Runtime 8.0.4 installed.
Description
Vector2.Normalize - wrong result and i think this is Cpu problem (.Net use some unsuported intrinsics?)
Reproduction Steps
Expected behavior
<0 1>
Actual behavior
<0 ∞>
Regression?
In 7 and 8.0.preview.1 - works 8.0.preview.2, rc.1, 8.0.201 ... - issue
Known Workarounds
No response
Configuration
I reproduced this on intel celeron e3300 and intel p6100 (Win10 and Win7) x64
Other information
Edit: copy from https://github.com/microsoft/referencesource/blob/master/System.Numerics/System/Numerics/Vector2.cs#L191
code
shows