Closed monniaux closed 3 years ago
ok, that sounds really bad. Thanks for the detailed analysis. I'm very busy ATM but I'll try to find some time to look at it. First thing I'll do will be to run it in a verilator simulation to see if it has the same behaviour there
The problem still occurs if the netlist is generated with Vivado 2020.2.
The problem does NOT occur if using Verilator simulation.
Booting Zephyr OS build v2.4.0-rc2-106-g25c0810ae2e6 sorted=true time cycles:0
Hi,
Thanks for the info. It has been high on my priority list to get this sorted out but I haven't been able to make time for it until now. I'll start with some testing to make sure I can reproduce it here. Which release of the SweRVolf rtl are you using?
FYI, I can reproduce the issue here, which is a good start
fusesoc/fusesoc_libraries/swervolf$ git rev-parse --short HEAD 2e1802e
Some observations so far:
mdw 0 16
I can see that parts of the memory has been overwritten by something. If I load the same program again, it usually works. This makes me suspect some cache issue.while(1) {
int i = 0;
do {
data_vec_random(vec, len);
quicksort(vec, len);
i++;
} while(data_vec_is_sorted(vec, len));
printf("Error after %d rounds\n", i);
}
return 0;
and I'm varying len to find clues. It's clear that shorter lengths makes the problem less common, but I haven't found if there is some specific value where it disappears completely. At len 500, it runs perhaps on average 4-80 rounds before crashing. At len 460 I see 400-7000 rounds before a crash and with len 450 I have been running ten minutes without crashes.
So not really any new info but wanted to leave a trail for myself and anyone else looking into this
I'm not getting crashes, just incorrect sorting. Maybe it depends on the compilation chain in use.
I see now that I wasn't very clear. After uploading the first time after recompilation I (almost) always get a crash where part of the memory gets overwritten
On subsequent runs with the same elf I don't get crashes, just incorrect sorts. And the longer len, the more likely that the sort is incorrect
iladata.tar.gz Another update. I have rewritten the test application further to make it easier to find and trigger the value that gets sorted in the wrong bin. I have also added probes inside the FPGA. Yesterday I managed to catch a case where a value was written to memory, but the old value is read back a while later. At this point is very likely an issue with the AXI subsystem or the memory controller and I will add more probes along the way.
Attaching a VCD dump of the Vivado capture for reference. We can see three reads from 0x5020 (initiated at times 180, 325 and 395. Each time we read back 0x5d.
At time 651 we can see that the value 0x5f gets written to address 0x5020. Later however, at time 921 we initiate a read to address 0x5020 and now we get back the old value 0x5d instead.
@monniaux I tracked the issue down to the litedram DDR controller but didn't pursue it further. Instead I generated a new litedram instance since there seems to have beeen a lot of bug fixes since the last one. With the updated version I'm getting consistent results with the sorting and can't reproduce the problem. Would you mind trying it as well to make sure it works on at least two different system. A preliminary version is pushed to the memfix branch for now
Using the 'memfix' branch:
My 'quicksort' and 'number theoretic transform' examples now run. Quicksort tested with both gcc and CompCert .
Fantastic! Sorry for taking so long. In hindsight I see it would had saved me a lot of work if I had prioritized this bug report
@olofk: That's great to see LiteDRAM used here. Sorry for the time you spent looking at this if this issue was related to the core. Improvements are still made to the core and that's good to see you've been able to update the core easily and that it fixed your issue. Happy to help on similar issues in the future if you suspect this are related the core.
Hi @enjoy-digital . LiteDRAM is great since it saves me from having to deal with FPGA vendor memory IP which usually is a pain. I was planning to contact you when I realized the bug was in LiteDRAM but thought I'd update first since many things had changed. That seems to have worked. It's good to see also that you have started with tagged releases. That makes it much easier to maintain than relying on a random combination of git SHAs
ntt.tar.gz The ntt sample benchmark. This is a kind of Fast Fourier Transform in a finite field, using integer arithmetic (modulus, multiplication, addition, subtraction). FFT then reverse-FFT must yield exactly the original array, otherwise it means there was some bug.
@monniaux , should this ntt program work with reduced buffer? instead of 64KB buffer with 4KB buffer ?
I'm getting buffer miscompare :
compare = -101
NTT needs 65536 elements in the buffer, but it should be possible to modify it for 256 elements.
De: "agrobman" notifications@github.com À: "chipsalliance/Cores-SweRVolf" Cores-SweRVolf@noreply.github.com Cc: "DAVID MONNIAUX" David.Monniaux@univ-grenoble-alpes.fr, "Mention" mention@noreply.github.com Envoyé: Vendredi 12 Février 2021 16:12:56 Objet: Re: [chipsalliance/Cores-SweRVolf] nondeterministic and incorrect behavior of code ran over SweRVolf on Nexys-A7 (#22)
[ https://github.com/monniaux | @monniaux ] , should this ntt program work with reduced buffer? instead of 64KB buffer with 4KB buffer ? I'm getting buffer miscompare : compare = -101
— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/chipsalliance/Cores-SweRVolf/issues/22#issuecomment-778253005 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AAZZPZWDSRIT7HVMGYU3TZLS6VAPRANCNFSM4RZZO6RQ | unsubscribe ] .
thanks, but to get maximum performance from SweRV CPU the data/stack should be placed to DCCM. I'm not sure what SweRV CPU config is used in SweRVolf, but to get good numbers with this program, you probably need to increase DCCM size. ( SweRV CPU has only 64KB DCCM with default configuration.)
This program has no particular interest except for testing scheduling inside our compiler.
For getting it to run with 256 words instead of 65536: replace the appropriate lines in ntt.c
SweRVolf release:
904f69e
The following implementation of QuickSort is deterministic and contains no undefined behavior (checked with tis-interpreter and with CompCert interpreter; compiles with no warning and runs perfectly otherwise). It should output
sorted = true
.I compiled it and ran it on SweRVolf on Nexys-A7 (tried both with netlist generation with Vivado 2018.2 and Vivado 2020.1). Executions are incorrect and nondeterministic :
The problem applies whether this program is compiled by Zephyr's default gcc, or compiled with CompCert then assembled by Zephyr's gcc.
I also had weird behavior with other programs.
quicksort.tar.gz