B-Lang-org / bsc

Bluespec Compiler (BSC)
Other
902 stars 141 forks source link

`sqrtFP` is incorrect #710

Open axelfeldmann opened 3 weeks ago

axelfeldmann commented 3 weeks ago

Hi,

I've been trying to use sqrtFP from the FloatingPoint library, and I was getting incorrect results. I then ran make check in testsuite/bsc.lib/FloatingPoint (with testArith = True) and it seems like sqrtFP fails all tests with lines like:

sqrtFP(<Float +43e.e1fffffffffff>) = <Float +41f.9dcea82749078> ('b00000), expected <Float +41e.f0c60a033a7b2> ('b00001)

Is this a known problem? Are there any correct versions of the function available on other branches?

Thanks!

quark17 commented 3 weeks ago

I am not aware of any issues, but because we are not testing that function, its behavior could have changed without us knowing. It's a combinational function for square root, which isn't what you'd use in hardware, so possibly no one has been using it and so no one noticed.

It took a while to run on my computer with testArith turned on, but I do confirm that both Bluesim and Verilog (with iverilog and verilator) fail for sqrtFP (while the other testArith functions pass).

The make output will say FAIL because there are new lines in the output that are unexpected, but that doesn't necessarily mean they are failures, just additional tests:

Running ./FloatTest.exp ...
FAIL: `sysFloatTest.v.out' differs from `sysFloatTest.out.expected'
FAIL: `sysFloatTest.c.out' differs from `sysFloatTest.out.expected'

The example is testing hardware modules for performing the operations and has output starting with divide, sqrt, add, etc showing the inputs and outputs (and optionally the expected result if the output doesn't match). When we turn on testArith, that adds additional tests for combinational logic to do the computation, and add additional output starting with divideFP, sqrtFP, addFP, etc, interleaved with the other tests and performed on the same inputs. So where the expected output has this:

divide <Float +40d.03a98fb4d2934> / <Float -764.ffffffffffff2> = <Float -0a7.03a98fb4d293b> ('b00001)

we now see two lines:

divide <Float +40d.03a98fb4d2934> / <Float -764.ffffffffffff2> = <Float -0a7.03a98fb4d293b> ('b00001)
divFP <Float +40d.03a98fb4d2934> / <Float -764.ffffffffffff2> = <Float -0a7.03a98fb4d293b> ('b00001)

where the second line shows the combinational logic computing the same result (that matches the expected).

For the combinational square root computation, though, we see slight discrepancies:

sqrter(<Float +000.0000000000001>) = <Float +1e6.0000000000000> ('b00000)
sqrtFP(<Float +000.0000000000001>) = <Float +1e5.67def5c000000> ('b00000), expected <Float +1e6.0000000000000> ('b00000)

sqrter(<Float +43e.e1fffffffffff>) = <Float +41e.f0c60a033a7b2> ('b00001)
sqrtFP(<Float +43e.e1fffffffffff>) = <Float +41f.9dcea82749078> ('b00000), expected <Float +41e.f0c60a033a7b2> ('b00001)

sqrter(<Float +000.fffffffffffff>) = <Float +1ff.fffffffffffff> ('b00001)
sqrtFP(<Float +000.fffffffffffff>) = <Float +1ff.37671037fcd58> ('b00001), expected <Float +1ff.fffffffffffff> ('b00001)

and this is true for all 30 square root tests, and at the end of the simulation is reported the number of errors:

        30 error(s) found.

The results are close, though. I don't know enough about the function to know if this difference is expected or not. It could be that the function is computing in a different way and gets a slightly different approximation; or it could be that the function, while looping through the bits, has an off-by-one bug and if failing to include the contribution due to the last bit, say.

I was not involved in writing the library or the tests, so I don't know. Maybe @darius-bluespec knows and can comment? He added the testArith code in 2012, but left it off by default because they take a long time. I don't know if the output matched at the time -- I would assume it did. But unfortunately by not having the testing turned on, the behavior could have changed without us knowing.

It's a short function, so maybe someone can read what it's doing and figure it out.

Is there a reason that you need the sqrtFP function, instead of one of the square root modules? As a function, it computes the square root in combinational logic and is therefore probably not useful in real hardware. There are pipelined and non-pipelined modules for doing square root (in the SquareRoot.bsv package), and the test you ran includes a test for the pipelined one (mkSquareRooter), while the Bluespec Inc CPUs (like Flute and Toooba) use the non-pipelined one in their FPU.

axelfeldmann commented 3 weeks ago

I actually see some much bigger numerical discrepancies. For example, on a small-ish (<100) perfect squares, the sqrtFP seems to return 0?

I'd like to use this combinational circuit as part of my FPGA development loop. On the actual FPGA, I'll be using the Xilinx floating point IP and putting it into my design using BVI. However, I can't simulate the binary blob Xilinx IP with Bluesim or Verilator. So, when I am working in simulation, I use simulation-only floating point units that are combinational functions (like sqrtFP) followed by an n-cycle register pipeline that matches the n-cycle latency of my chosen Xilinx IP. This approach has worked great for the other FP units, but led me to find this correctness bug when I tried to use sqrtFP.

Thanks for looking into this! I really appreciate it :)

mieszko commented 3 weeks ago

Yeah I remember the FP module having some issues. I checked it with tests generated from IBM's fpgen two years ago, and documented the failures (incl. things like sqrt 1), but this was an email discussion rather than a GitHub one so I think it dropped off the radar at some point.

IIRC sqrt was the most flagrant issue, but there were more edge cases, like issues with subnormals, infinities and rounding issues in FMA, quiet NaNs, reported FP traps mismatches, bluesim failing with FPEs on some tests, etc. For some of those I couldn't determine whether it was a bug or not b/c I didn't have the IEEE FP standard document.

For my application I ended up wrapping DesignWare modules instead, and I used their sim-only Verilog models to simulate stuff, which worked fine. If you can't get Xilinx IP stuff to work (I'd think it would work with the Modelsim Xilinx includes at least?) maybe you can just find something to wrap, or maybe wrap something simple that uses SystemVerilog native real or shortreal types? (I think they even support $sqrt()).

But yeah it would be nice if someone with expertise / patience / access to the standard doc could get this fixed. I remember I made some thing that parsed the fpgen tests and ran them on the BS lib, so if you're interested in trying to fix things I can try to find that and give it to you.

quark17 commented 3 weeks ago

For simulation, you can also import C functions (via import-BDPI) to do the math. For sqrt, an imported function would probably compile and run faster than a large unrolled loop of logic.

darius-bluespec commented 3 weeks ago

I don't know how well sqrtFP was ever tested or used; the pipelined module was definitely used and tested much more extensively. It looks like sqrtFP uses the same algorithm as the pipelined module, so it probably wouldn't be too difficult to debug and resolve the differences between them.

quark17 commented 3 weeks ago

Thanks! I agree, that if someone has the time to look, it shouldn't be too difficult to compare with mkSquareRooter. And FYI, there were fixes to the mkFixedPointSquareRooter module (see PR #148) that was giving wrong answers due to how it was computing the shift -- which could be another point of reference.