bespoke-silicon-group / bsg_manycore

Tile based architecture designed for computing efficiency, scalability and generality
Other
223 stars 58 forks source link

nbody fails #164

Open tommydcjung opened 4 years ago

tommydcjung commented 4 years ago
BSG_FAIL------------------------------------------------------------------------------------------------------>
beebs   nbody   st
tommydcjung commented 4 years ago

Tried with inifinite memory instead of vcache. The test still fails.

vb000 commented 4 years ago

Hi @tommydcjung! I reproduced this failure in the smallest testcase here: https://github.com/bespoke-silicon-group/bsg_manycore/blob/double_div_bug/software/spmd/double_mdiv/main.c

There is a mismatch with x86's result of the operation a * b / c where a, b and c are doubles. observed=bebe09c588e07a4c whereas x86's result expected=bebe09c588e07a4d. Could you take on from here? Thanks!

tommydcjung commented 4 years ago

Hi @vb000 thanks for narrowing down the problem. But first let's first track down which change caused this regression to fail, instead of starting to debug this test you generated. The general process is that people run the regression before checking in the code, but that did not happen. And now we have this mess.

vb000 commented 4 years ago

I suspect this failure has to do with the riscv toolchain update for floating point. There's only so much debugging effort I can put in, given I had other failures to work on. I paid attention to this failure as soon as I could. It's not anyone's full time job to keep installing and testing up to date toolchains, so little debugging help would be appreciated.

tommydcjung commented 4 years ago

I think it has to do with the rounding mode. I tried on x86 with this toy program

#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <fenv.h>

#pragma STDC FENV_ACCESS ON

#define dbl(X) (*(double*)&X)
#define lng(X) (*(uint64_t*)&X)

void foo()
{ 
  double af,bf,cf,zf;
  uint64_t ai = 0xbfa1cb88587665f6;
  uint64_t bi = 0x3f60a8f3531799ac;
  uint64_t ci = 0x4043bd3cc9be45de;
  af = dbl(ai);
  bf = dbl(bi);
  cf = dbl(ci);

  zf = af * bf / cf;

  uint64_t zi = lng(zf);
  printf("%" PRIx64 "\n", zi);
}

int main()
{
  fesetround(FE_TONEAREST);
  foo();
  fesetround(FE_TOWARDZERO);
  foo();
  fesetround(FE_UPWARD);
  foo();
  fesetround(FE_DOWNWARD);
  foo();
}
result:
bebe09c588e07a4d  // FE_TONEAREST
bebe09c588e07a4c  // FE_TOWARDZERO
bebe09c588e07a4c  // FE_UPWARD
bebe09c588e07a4f  // FE_DOWNWARD

for some reason, it might be thinking that it's in TOWARDZERO or UPWARD mode? any idea?

tommydcjung commented 4 years ago

I think I know what the problem is. working on the solution.

vb000 commented 4 years ago

Rounding mode's changing? Interesting, given we don't have a csr controlling mode. What rounding mode does manycore defaults to?

tommydcjung commented 4 years ago

https://github.com/bespoke-silicon-group/bsg_manycore/pull/183

tommydcjung commented 4 years ago

I think software fix is to clear out the register after accessing csr.

https://github.com/bminor/newlib/blob/6497fdfaf41d47e835fdefc78ecb0a934875d7cf/newlib/libm/machine/riscv/fegetround.c

vb000 commented 4 years ago

Not sure if I correctly understood the comment... why would software read fcsr in the first place if it clears the destination reg immediately?

tommydcjung commented 4 years ago

Suggesting this as a temporary fix, if really necessary. Since the only rounding mode currently supported is zero, assuming that the software does not need to use other rounding modes, it can guarantee consistent result.

vb000 commented 4 years ago

I agree the immediate solution is to zero out the rounding mode result. But I think hw has to handle that zeroing because, the software that reads rounding mode could be open source library implementation which we'd like to support as is.

My observation is, almost all libraries seem to assume these sets of floating point support: {basic fp, fcsr}, {fma}, {fdiv, fsqrt}. Supporting basic fp and not handling fcsr puts us in a tricky position where we have to keep tweaking external software libraries.

vb000 commented 4 years ago

"Does anyone have hardware with FP but not FP div?" -- Veteran toolchain developer's opinion :)

Which makes me think, in the long run, we'd save lot of time if we conform a little bit to standard assumptions. Like, "if fpu is implemented fcsr is also most probably implemeted", "fdiv and fsqrt are implemeted together" etc.

dpetrisko commented 4 years ago

Just to reiterate, the Zcsr extension (containing FCSR) is mandatory for RISC-V processors which support F. The safest, most compliant behavior is to always return zero on fcsr read and throw an exception (or the mc equivalent) on a nonzero fcsr write (or actual FP exception if we detect them)

tommydcjung commented 4 years ago

I'm not against fully supporting fcsr.

dpetrisko commented 4 years ago

Great! I’m just proposing the zero-exception scheme for the short term

taylor-bsg commented 4 years ago

+1 to Tommy and Dan

On Fri, Oct 25, 2019 at 10:21 AM Dan Petrisko notifications@github.com wrote:

Great! I’m just proposing the zero-exception scheme for the short term

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bespoke-silicon-group/bsg_manycore/issues/164?email_source=notifications&email_token=AEFG5ACJ24B6TUCQBH2G4HLQQMTLNA5CNFSM4IY43V5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJABSI#issuecomment-546439369, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFG5ABP6O4ANWMEDFDG3G3QQMTLNANCNFSM4IY43V5A .