Net8 - Vector2.Normalize and old(?) CPU - wrong result

sDIMMaX commented 7 months ago

Description

Vector2.Normalize - wrong result and i think this is Cpu problem (.Net use some unsuported intrinsics?)

Reproduction Steps

Console.WriteLine(Vector2.Normalize(new(0,2)));

Expected behavior

<0 1>

Actual behavior

<0 ∞>

Regression?

In 7 and 8.0.preview.1 - works 8.0.preview.2, rc.1, 8.0.201 ... - issue

Known Workarounds

No response

Configuration

I reproduced this on intel celeron e3300 and intel p6100 (Win10 and Win7) x64

Other information

Edit: copy from https://github.com/microsoft/referencesource/blob/master/System.Numerics/System/Numerics/Vector2.cs#L191

        public static Vector2 Normalize2(Vector2 value)
        {
            if (Vector.IsHardwareAccelerated)
            {
//-----------------------------------------------------------------------------
        Console.WriteLine("HardwareAccelerated");
//-----------------------------------------------------------------------------
                float length = value.Length();
                return value / length;
            }
            else
            {
                float ls = value.X * value.X + value.Y * value.Y;
                float invNorm = 1.0f / (float)Math.Sqrt((double)ls);

                return new Vector2(
                    value.X * invNorm,
                    value.Y * invNorm);
            }
        }

code

var v1 = new Vector2(0,2);
Console.WriteLine(Vector2.Normalize(v1));
Console.WriteLine(Normalize2(v1));

shows

<0  ∞>
HardwareAccelerated
<0  1>

ghost commented 7 months ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.

Issue Details

### Description Vector2.Normalize - wrong result and i think this is Cpu problem (.Net use some unsuported intrinsics?) ### Reproduction Steps ``` var v1 = new Vector2(0,2); var v2 = new Vector2(0,0); var previousEdgeDirection = Vector2.Normalize(v1 - v2); Console.WriteLine(previousEdgeDirection); ``` ### Expected behavior `<0 1>` ### Actual behavior `<0 ∞>` ### Regression? In 7 and 8.0.preview.1 - works 8.0.preview.2, rc.1, 8.0.201 ... - issue ### Known Workarounds _No response_ ### Configuration I reproduced this on intel celeron e3300 and intel p6100 (Win10 and Win7) x64 ### Other information _No response_

Author:	sDIMMaX
Assignees:	-
Labels:	`area-System.Numerics`, `untriaged`
Milestone:	-

huoyaoyuan commented 7 months ago

Looks somehow related to #96939. Normalize is lowered to /Sqrt(Dot). The result looks like element at index 1 incorrectly remains 0.

Celeron E3300 is Wolfdale microarch which supports just SSE4.1. This could be another issue for different instruction set level.

ghost commented 7 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

### Description Vector2.Normalize - wrong result and i think this is Cpu problem (.Net use some unsuported intrinsics?) ### Reproduction Steps ``` Console.WriteLine(Vector2.Normalize(new(0,2))); ``` ### Expected behavior `<0 1>` ### Actual behavior `<0 ∞>` ### Regression? In 7 and 8.0.preview.1 - works 8.0.preview.2, rc.1, 8.0.201 ... - issue ### Known Workarounds _No response_ ### Configuration I reproduced this on intel celeron e3300 and intel p6100 (Win10 and Win7) x64 ### Other information Edit: copy from https://github.com/microsoft/referencesource/blob/master/System.Numerics/System/Numerics/Vector2.cs#L191 ``` public static Vector2 Normalize2(Vector2 value) { if (Vector.IsHardwareAccelerated) { //----------------------------------------------------------------------------- Console.WriteLine("HardwareAccelerated"); //----------------------------------------------------------------------------- float length = value.Length(); return value / length; } else { float ls = value.X * value.X + value.Y * value.Y; float invNorm = 1.0f / (float)Math.Sqrt((double)ls); return new Vector2( value.X * invNorm, value.Y * invNorm); } } ``` code ``` var v1 = new Vector2(0,2); Console.WriteLine(Vector2.Normalize(v1)); Console.WriteLine(Normalize2(v1)); ``` shows ``` <0 ∞> HardwareAccelerated <0 1> ```

Author:	sDIMMaX
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

huoyaoyuan commented 7 months ago

The issue does reproduce with DOTNET_EnableSSE41=0. Does not reproduce with DOTNET_EnableSSE42=0.

So there should still be issue around pre-SSE4.1 codegen.

sDIMMaX commented 7 months ago

Jit make unsupported sse4.2 instructions?

huoyaoyuan commented 7 months ago

When JIT is avoiding SSE4.1, it generates wrong code. The wrong SSE4.1-less code also produces wrong result on modern CPU.

Usage of unsupported instruction will probably cause hard crash of the process.

sDIMMaX commented 7 months ago

v1 / (float)Math.Sqrt(Vector2.Dot(v1,v1)); and Vector2.Dot(v1,v1); works fine

MichalPetryka commented 7 months ago

Diff between working (left) codegen from manual impl and broken (right) from the intrinsic.

AndyAyersMS commented 7 months ago

FYI @dotnet/jit-contrib @tannergooding

akade commented 6 months ago

I just had the same issue. In case it helps, here are the results of my investigation:

I encountered the issue on a virtualized machine (kvm64 cpu profile on a physical Xeon(R) Gold 6226R) that only reports support up to SSE 3. I get <x' ∞> as a result here, x seems to be correct depending on input.

Code:

using System.Numerics;
var vector = new Vector2(1, 1);
vector = Vector2.Normalize(vector);
Console.WriteLine(vector.ToString());

The generated ASM is the following:

movsd    xmm0, qword ptr [reloc @RWD00]
movsd    qword ptr [rbp-0x08], xmm0
movsd    xmm0, qword ptr [rbp-0x08]
movsd    xmm1, qword ptr [rbp-0x08]
movsd    xmm2, qword ptr [rbp-0x08]
mulps    xmm1, xmm2
movsd    qword ptr [rbp-0x10], xmm1
movsd    xmm1, qword ptr [rbp-0x10]
movsd    xmm2, qword ptr [rbp-0x10]
haddps   xmm1, xmm2
sqrtps   xmm1, xmm1
divps    xmm0, xmm1
movsd    qword ptr [rbp-0x08], xmm0
lea      rcx, [rbp-0x08]
call     [System.Numerics.Vector2:ToString():System.String:this]
mov      rcx, rax
call     [System.Console:WriteLine(System.String)]
nop

On my machine (i7-10710U, with SSE4.1 and SSE4.2) with DOTNET_EnableSSE41=0 the same code is generated but results in <x' 8>. I have no clue whatsoever why different results are produced.

With DOTNET_EnableSSE41=1, the following code is generated using dpps (SSE 4.1) and the result is correct:

  movsd    xmm0, qword ptr [reloc @RWD00]
  movsd    qword ptr [rbp-0x08], xmm0
  movsd    xmm0, qword ptr [rbp-0x08]
  movsd    xmm1, qword ptr [rbp-0x08]
  movsd    xmm2, qword ptr [rbp-0x08]
+ dpps     xmm1, xmm2, 63
- mulps    xmm1, xmm2
- movsd    qword ptr [rbp-0x10], xmm1
- movsd    xmm1, qword ptr [rbp-0x10]
- movsd    xmm2, qword ptr [rbp-0x10]
- haddps   xmm1, xmm2
  sqrtps   xmm1, xmm1
  divps    xmm0, xmm1
  movsd    qword ptr [rbp-0x08], xmm0
  lea      rcx, [rbp-0x08]
  call     [System.Numerics.Vector2:ToString():System.String:this]
  mov      rcx, rax
  call     [System.Console:WriteLine(System.String)]
  nop

Both machines have SDK 8.2.204 and Runtime 8.0.4 installed.

dotnet / runtime