Open axelfeldmann opened 3 weeks ago
I am not aware of any issues, but because we are not testing that function, its behavior could have changed without us knowing. It's a combinational function for square root, which isn't what you'd use in hardware, so possibly no one has been using it and so no one noticed.
It took a while to run on my computer with testArith
turned on, but I do confirm that both Bluesim and Verilog (with iverilog and verilator) fail for sqrtFP
(while the other testArith
functions pass).
The make
output will say FAIL
because there are new lines in the output that are unexpected, but that doesn't necessarily mean they are failures, just additional tests:
Running ./FloatTest.exp ...
FAIL: `sysFloatTest.v.out' differs from `sysFloatTest.out.expected'
FAIL: `sysFloatTest.c.out' differs from `sysFloatTest.out.expected'
The example is testing hardware modules for performing the operations and has output starting with divide
, sqrt
, add
, etc showing the inputs and outputs (and optionally the expected result if the output doesn't match). When we turn on testArith
, that adds additional tests for combinational logic to do the computation, and add additional output starting with divideFP
, sqrtFP
, addFP
, etc, interleaved with the other tests and performed on the same inputs. So where the expected output has this:
divide <Float +40d.03a98fb4d2934> / <Float -764.ffffffffffff2> = <Float -0a7.03a98fb4d293b> ('b00001)
we now see two lines:
divide <Float +40d.03a98fb4d2934> / <Float -764.ffffffffffff2> = <Float -0a7.03a98fb4d293b> ('b00001)
divFP <Float +40d.03a98fb4d2934> / <Float -764.ffffffffffff2> = <Float -0a7.03a98fb4d293b> ('b00001)
where the second line shows the combinational logic computing the same result (that matches the expected).
For the combinational square root computation, though, we see slight discrepancies:
sqrter(<Float +000.0000000000001>) = <Float +1e6.0000000000000> ('b00000)
sqrtFP(<Float +000.0000000000001>) = <Float +1e5.67def5c000000> ('b00000), expected <Float +1e6.0000000000000> ('b00000)
sqrter(<Float +43e.e1fffffffffff>) = <Float +41e.f0c60a033a7b2> ('b00001)
sqrtFP(<Float +43e.e1fffffffffff>) = <Float +41f.9dcea82749078> ('b00000), expected <Float +41e.f0c60a033a7b2> ('b00001)
sqrter(<Float +000.fffffffffffff>) = <Float +1ff.fffffffffffff> ('b00001)
sqrtFP(<Float +000.fffffffffffff>) = <Float +1ff.37671037fcd58> ('b00001), expected <Float +1ff.fffffffffffff> ('b00001)
and this is true for all 30 square root tests, and at the end of the simulation is reported the number of errors:
30 error(s) found.
The results are close, though. I don't know enough about the function to know if this difference is expected or not. It could be that the function is computing in a different way and gets a slightly different approximation; or it could be that the function, while looping through the bits, has an off-by-one bug and if failing to include the contribution due to the last bit, say.
I was not involved in writing the library or the tests, so I don't know. Maybe @darius-bluespec knows and can comment? He added the testArith
code in 2012, but left it off by default because they take a long time. I don't know if the output matched at the time -- I would assume it did. But unfortunately by not having the testing turned on, the behavior could have changed without us knowing.
It's a short function, so maybe someone can read what it's doing and figure it out.
Is there a reason that you need the sqrtFP
function, instead of one of the square root modules? As a function, it computes the square root in combinational logic and is therefore probably not useful in real hardware. There are pipelined and non-pipelined modules for doing square root (in the SquareRoot.bsv
package), and the test you ran includes a test for the pipelined one (mkSquareRooter
), while the Bluespec Inc CPUs (like Flute and Toooba) use the non-pipelined one in their FPU.
I actually see some much bigger numerical discrepancies. For example, on a small-ish (<100) perfect squares, the sqrtFP
seems to return 0?
I'd like to use this combinational circuit as part of my FPGA development loop. On the actual FPGA, I'll be using the Xilinx floating point IP and putting it into my design using BVI. However, I can't simulate the binary blob Xilinx IP with Bluesim or Verilator. So, when I am working in simulation, I use simulation-only floating point units that are combinational functions (like sqrtFP
) followed by an n
-cycle register pipeline that matches the n
-cycle latency of my chosen Xilinx IP. This approach has worked great for the other FP units, but led me to find this correctness bug when I tried to use sqrtFP
.
Thanks for looking into this! I really appreciate it :)
Yeah I remember the FP module having some issues. I checked it with tests generated from IBM's fpgen two years ago, and documented the failures (incl. things like sqrt 1
), but this was an email discussion rather than a GitHub one so I think it dropped off the radar at some point.
IIRC sqrt
was the most flagrant issue, but there were more edge cases, like issues with subnormals, infinities and rounding issues in FMA, quiet NaNs, reported FP traps mismatches, bluesim failing with FPEs on some tests, etc. For some of those I couldn't determine whether it was a bug or not b/c I didn't have the IEEE FP standard document.
For my application I ended up wrapping DesignWare modules instead, and I used their sim-only Verilog models to simulate stuff, which worked fine. If you can't get Xilinx IP stuff to work (I'd think it would work with the Modelsim Xilinx includes at least?) maybe you can just find something to wrap, or maybe wrap something simple that uses SystemVerilog native real
or shortreal
types? (I think they even support $sqrt()
).
But yeah it would be nice if someone with expertise / patience / access to the standard doc could get this fixed. I remember I made some thing that parsed the fpgen
tests and ran them on the BS lib, so if you're interested in trying to fix things I can try to find that and give it to you.
For simulation, you can also import C functions (via import-BDPI) to do the math. For sqrt, an imported function would probably compile and run faster than a large unrolled loop of logic.
I don't know how well sqrtFP was ever tested or used; the pipelined module was definitely used and tested much more extensively. It looks like sqrtFP uses the same algorithm as the pipelined module, so it probably wouldn't be too difficult to debug and resolve the differences between them.
Thanks! I agree, that if someone has the time to look, it shouldn't be too difficult to compare with mkSquareRooter
. And FYI, there were fixes to the mkFixedPointSquareRooter
module (see PR #148) that was giving wrong answers due to how it was computing the shift -- which could be another point of reference.
Hi,
I've been trying to use
sqrtFP
from theFloatingPoint
library, and I was getting incorrect results. I then ranmake check
intestsuite/bsc.lib/FloatingPoint
(withtestArith = True
) and it seems likesqrtFP
fails all tests with lines like:Is this a known problem? Are there any correct versions of the function available on other branches?
Thanks!