Open tommydcjung opened 4 years ago
Tried with inifinite memory instead of vcache. The test still fails.
Hi @tommydcjung! I reproduced this failure in the smallest testcase here: https://github.com/bespoke-silicon-group/bsg_manycore/blob/double_div_bug/software/spmd/double_mdiv/main.c
There is a mismatch with x86's result of the operation a * b / c
where a, b and c are doubles.
observed=bebe09c588e07a4c
whereas x86's result expected=bebe09c588e07a4d
. Could you take on from here? Thanks!
Hi @vb000 thanks for narrowing down the problem. But first let's first track down which change caused this regression to fail, instead of starting to debug this test you generated. The general process is that people run the regression before checking in the code, but that did not happen. And now we have this mess.
I suspect this failure has to do with the riscv toolchain update for floating point. There's only so much debugging effort I can put in, given I had other failures to work on. I paid attention to this failure as soon as I could. It's not anyone's full time job to keep installing and testing up to date toolchains, so little debugging help would be appreciated.
I think it has to do with the rounding mode. I tried on x86 with this toy program
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON
#define dbl(X) (*(double*)&X)
#define lng(X) (*(uint64_t*)&X)
void foo()
{
double af,bf,cf,zf;
uint64_t ai = 0xbfa1cb88587665f6;
uint64_t bi = 0x3f60a8f3531799ac;
uint64_t ci = 0x4043bd3cc9be45de;
af = dbl(ai);
bf = dbl(bi);
cf = dbl(ci);
zf = af * bf / cf;
uint64_t zi = lng(zf);
printf("%" PRIx64 "\n", zi);
}
int main()
{
fesetround(FE_TONEAREST);
foo();
fesetround(FE_TOWARDZERO);
foo();
fesetround(FE_UPWARD);
foo();
fesetround(FE_DOWNWARD);
foo();
}
result:
bebe09c588e07a4d // FE_TONEAREST
bebe09c588e07a4c // FE_TOWARDZERO
bebe09c588e07a4c // FE_UPWARD
bebe09c588e07a4f // FE_DOWNWARD
for some reason, it might be thinking that it's in TOWARDZERO or UPWARD mode? any idea?
I think I know what the problem is. working on the solution.
Rounding mode's changing? Interesting, given we don't have a csr controlling mode. What rounding mode does manycore defaults to?
I think software fix is to clear out the register after accessing csr.
Not sure if I correctly understood the comment... why would software read fcsr in the first place if it clears the destination reg immediately?
Suggesting this as a temporary fix, if really necessary. Since the only rounding mode currently supported is zero, assuming that the software does not need to use other rounding modes, it can guarantee consistent result.
I agree the immediate solution is to zero out the rounding mode result. But I think hw has to handle that zeroing because, the software that reads rounding mode could be open source library implementation which we'd like to support as is.
My observation is, almost all libraries seem to assume these sets of floating point support: {basic fp
, fcsr
}, {fma
}, {fdiv
, fsqrt
}. Supporting basic fp
and not handling fcsr
puts us in a tricky position where we have to keep tweaking external software libraries.
"Does anyone have hardware with FP but not FP div?" -- Veteran toolchain developer's opinion :)
Which makes me think, in the long run, we'd save lot of time if we conform a little bit to standard assumptions. Like, "if fpu is implemented fcsr is also most probably implemeted", "fdiv and fsqrt are implemeted together" etc.
Just to reiterate, the Zcsr extension (containing FCSR) is mandatory for RISC-V processors which support F. The safest, most compliant behavior is to always return zero on fcsr read and throw an exception (or the mc equivalent) on a nonzero fcsr write (or actual FP exception if we detect them)
I'm not against fully supporting fcsr.
Great! I’m just proposing the zero-exception scheme for the short term
+1 to Tommy and Dan
On Fri, Oct 25, 2019 at 10:21 AM Dan Petrisko notifications@github.com wrote:
Great! I’m just proposing the zero-exception scheme for the short term
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bespoke-silicon-group/bsg_manycore/issues/164?email_source=notifications&email_token=AEFG5ACJ24B6TUCQBH2G4HLQQMTLNA5CNFSM4IY43V5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJABSI#issuecomment-546439369, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFG5ABP6O4ANWMEDFDG3G3QQMTLNANCNFSM4IY43V5A .