dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.82k stars 4.61k forks source link

Numerical Issue in Math.Log(x,b) when upgrading to Update 1 #4949

Closed redknightlois closed 4 years ago

redknightlois commented 8 years ago

After updating to Update 1 I have a test that started to fail.

The core is this: Assert.Equal((uint)Math.Log(BitVector.BitsPerWord, 2), BitVector.Log2BitsPerWord); essentially what I am doing there is: Assert.Equal((uint)Math.Log(64,2), 6);

My surprise was when I looked into the result of Math.Log(64,2) which should be 6, however Update1 shows 5.9999999999999991

Needless to say that the correct answer is 6 http://www.wolframalpha.com/input/?i=log%2864%29%2Flog%282%29

EDIT: For the look of it, it could be a JIT optimization going wrong as I doubt someone would have been tampering with Math.Log in a release candidate.

redknightlois commented 8 years ago

Another important detail. If you run Math.Log(64) / Math.Log(2) then the value is correct.

EDIT: It looks it only happen with x86 runtime.

mikedn commented 8 years ago

I can't seem to reproduce this with CoreCLR, the following program returns 6:

static int Main() {
    return (int)Math.Log(64, 2);
}
redknightlois commented 8 years ago

Are you running in x86? I can make it fail reliably (always) with: "sdkVersion": "dnx-coreclr-win-x86.1.0.0-rc1-update1"

mikedn commented 8 years ago

Are you running in x86?

Well, no because CoreCLR doesn't quite do x86 at the moment.

But now I tried with .NET 4.6.1 x86 and x64 and I still get 6.

redknightlois commented 8 years ago

I dont know. It's being released as a package so I assume it was already out :)

Worse, it is actually VS 2015 default.

mikedn commented 8 years ago

I dont know. It's being released as a package so I assume it was already out :)

The x86 version of CoreCLR is built by MS from internal sources since the x86 JIT compiler isn't exactly open sourced.

Worse, it is actually VS 2015 default.

So maybe the issue was fixed in Update 1 and that's why I cannot reproduce it?

RussKeldorph commented 8 years ago

Not sure if this will help, but I looked at a similar issue for Math.Exp a while ago, and that turned out to be a change in the native C Runtime, which (oh so helpfully) is only willing to sign up for "usually correct within +/-1ulp" in order to maximize perf. If you're willing to step through the native instructions, you might find the source of the problem there. It could be something like a CRT runtime check taking advantage of FMA3 instructions or something like that (on the off chance you have modern-ish hardware).

I'm obliged to mention that x86-specific Visual-studio-specific issues should be reported at http://connect.microsoft.com/VisualStudio CoreCLR doesn't support x86 at this time. Take that for what it's worth.

Edit: I have been educated below.

jkotas commented 8 years ago

CoreCLR doesn't support x86 at this time

@RussKeldorph CoreCLR (binaries available via NuGet packages, used by ASP.NET 5) does support x86. One just cannot built these bits from this repo yet.

mikedn commented 8 years ago

I found a dnx-coreclr-win-x86.1.0.0-rc2-16128 and tested, I still get 6. It might be interesting to know the CPU you use to test in case the CRT uses a different implementation. Though it's probably a new enough CPU to not matter...

RussKeldorph commented 8 years ago

@jkotas Apologies for spreading bad info, and thanks for the correction.

RussKeldorph commented 8 years ago

@redknightlois I, too, am having trouble reproducing this. Could you provide detailed repro steps and include the DNX and runtime versions, OS version, and CPU you're running under?

redknightlois commented 8 years ago

@RussKeldorph

CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20Ghz 3.20GHz OS: Windows 10 x64 (fully updated)

This is the code:

    [Fact]
    public void Constants()
    {
        Assert.Equal((uint)(Math.Log(64, 2)), (uint)6);
    }

This is how it gets executed:

------ Run test started ------
------ Test started: Project: Sparrow.Tests ------
Starting  Microsoft.Dnx.TestHost [C:\Users\federico.lois.CORVALIUS\.dnx\runtimes\dnx-coreclr-win-x86.1.0.0-rc1-update1\bin\dnx.exe --appbase "D:\Repositories\ravendb-40.git\test\Sparrow.Tests" Microsoft.Dnx.ApplicationHost --port 55739 Microsoft.Dnx.TestHost --port 55875 --parentProcessId 16160]
Connected to Microsoft.Dnx.TestHost
Running tests in 'D:\Repositories\ravendb-40.git\test\Sparrow.Tests\project.json'
========== Run test finished: 1 run (0:00:04,3251145) ==========

For what it's worth. I can reproduce it reliably on my machine. If you still cannot repro it by next week, we can have a talk on Skype and you can probe my system through screenshare to gather extra information.

RussKeldorph commented 8 years ago

Sorry for delay. I was sick last week. Ok, have repro. On my machine at least, when the issue doesn't happen, I'm linking to msvcr120_clr0400.dll, which says

_CIlog(64.0) = 4.158883083359671495320e+0000 (0:4001:851591f9dd5b9800)
_CIlog( 2.0) = 6.931471805599452862270e-0001 (0:3ffe:b17217f7d1cf7800)

In the repro case, coreclr is linking to msvcrt.dll, which says:

_CIlog(64.0) = 4.158883083359671856570e+0000 (0:4001:851591f9dd5b9b41)
_CIlog( 2.0) = 6.931471805599453094290e-0001 (0:3ffe:b17217f7d1cf79ac)

So it looks like the latter is returning a "double extended"-precision result, whereas the former is using double-precision. I suspect there's an easy way to configure msvcrt.dll to behave like msvcr120_clr0400.dll, but I'll have to look into it.

redknightlois commented 8 years ago

Good to know there is a repro.

RussKeldorph commented 8 years ago

Coreclr depends on the system CRT (msvcrt.dll), which doesn't behave the same as VS CRTs. I imagine this could be addressed in a number of ways--everything from ignoring and accepting as is to a full audit and tightening of FP semantics. Assuming this change was an unintentional consequence of targeting msvcrt.dll, my recommendation would be for src\classlibnative\float\floatnative.cpp to do its best to protect itself from CRT behavior. In this case, I believe forcing the results of transcendental function calls like log through memory would restore compatibility to earlier releases by forcing a round to double (or float if appropriate). I think storing to a volatile local (on x86 only) should work.

Note that the fact that x86 coreclr is not open-sourced may affect how and where this issue is addressed.

jkotas commented 8 years ago

We consider these mirror changes in precision to be acceptable.

The math operations in CLR / CoreCLR have the same compatibility guarantees as the CRT. The compatibility guarantees of CRT are to be within the +/-1ulp margin of error.

There were similar minor changes in precision for each CLR / CoreCLR release that picked up a new major CRT version.

BTW: You can actually see similar differences even when running the exact same CLR / CoreCLR on different CPU models.

redknightlois commented 8 years ago

@jkotas The case of Math.Log(64,2) doesn't sounds like a +1/-1 ulp margin of error. And this code executes with the same accuracy from .Net 2.0 to .Net 4.6.1, all CoreCLR on x64 and x86 architectures. The only one I could repro is the dnvm package for CLR built for x86.

This kind of errors add up on numerical code, doesn't look like of the acceptable ones when the workaround Math.Log(64)/Math.Log(2) returns the correct value.

RussKeldorph commented 8 years ago

FWIW, the hex encoded representation of the incorrect result is 0x4017FFFFFFFFFFFF, which is exactly one ulp less than the correct answer of 0x4018000000000000 (6). I defer to @jkotas on the tolerability CoreCLR has for such variances.

redknightlois commented 8 years ago

Sorry, I said CoreCLR x64 and x86. Should have read CLR first, CoreCLR in the latter.

jkotas commented 8 years ago

This is by design as I have described above.

redknightlois commented 8 years ago

It eludes me how something that happens on a single CoreCLR build configuration (x86) can be considered by design. And more if the correct numerical behavior is used from CLR 2.0 up to CLR 4.6.1 and all the other configurations of CoreCLR. But I am noone to judge about it.

I am not saying it is a high-priority issue, but for numerical software the difference is high, because the guarantee in the others is using double extended operations but not on this configuration (when hitting the CRT --- which who knows how many other Math routines are doing it).

It is really not a problem for me, I already changed all instances to use Math.Log(x,y) --- paying an extra call and an extra division --- but others may find out about it on production on not such great terms.