Open kingiler opened 1 year ago
Can you provide a minimum reproducible example?
Test code:
main:
li t0, 0
fcvt.s.wu ft0, t0 // +0.0
fnmadd.s ft1, ft0, ft0, ft0 // -0.0
fmin.s ft2, ft0, ft1
print.float ft0
print.float ft1
print.float ft2
// exit gracefully
addi a0, zero, 0
addi a7, zero, 93
ecall
Expected output:
register ft0 contains value 0.0
register ft1 contains value -0.0
register ft2 contains value -0.0
Actual output:
register ft0 contains value 0.0
register ft1 contains value -0.0
register ft2 contains value 0.0
Similarly for fmax.
hmm...
>>> min(-0.0, +0.0)
-0.0
>>> min(+0.0, -0.0)
0.0
>>>
This is what Python says
I'm not sure what the official ieee spec says, might be interesting to find out. Or maybe the RISC-V spec is more relevant here?
One can see the RISCV spec page 68. IEEE 754 here specify the different between minNum
and minimumNumber
whilst the RISCV spec uses minimumNumber
(?). Relevant discussion in other programming languages like Rust here and C++ here section Note
. I guess Python's float is using minNum
?
For future reference, here is the relevant part of the spec:
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value −0.0 is considered to be less than the value +0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN.
Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs.
Thanks for bringing this up, I can see this being a potentially very hard to debug difference in behaviour, would be good to match the spec exactly.
I notice this when reading the spec for RVF, just curious whether we want to strictly follow the spec. I have already come up with a solution to address this issue, using PyO3.
I think we can have a best-effort python implementation that follows the description I copy/pasted above without too much overhead or trouble.
Some instructions do not align with RV32F spec.
fmin.s
and some others