Open alanvgreen opened 2 years ago
I have verified that the above Cfu works as expected when the default VexRiscV CPU is used.
Hi @alanvgreen , did it meet timing?
The CFU w/ slimopt+cfu worked correctly for me on Arty. Build command:
make EXTRA_LITEX_ARGS="--cpu-variant=slimopt+cfu" clean prog
I'll try proto2 now.
On proto2, I met timing by a ways (100MHz+), and I do see a symptom similar to yours: result is 0x12345778
.
And on proto2, with slim+cfu
and your CFU, I get the correct 0x12345678
.
I believe it did meet timing, yes.
On Sun, Dec 12, 2021 at 3:26 PM TCal @.***> wrote:
Hi @alanvgreen https://github.com/alanvgreen , did it meet timing?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/CFU-Playground/issues/374#issuecomment-991833053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ4WATO2UVPXKJARWCY3WLUQQP6DANCNFSM5J3PFE3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
When I build slimopt+cfu
using Radiant, I see the correct answer 0x12345678
.
Ran again - confirmed it meets timing (102.87 MHz). Still producing incorrect output - see uart output below.
Hello, World!
CFU Playground
==============
1: TfLM Models menu
2: Functional CFU Tests
3: Project menu
4: Performance Counter Tests
5: TFLite Unit Tests
6: Benchmarks
7: Util Tests
8: SPI Flash Debugging
main> 3
Running Project menu
Project Menu
============
p: Ping CFU
v: Verify register tests
L: test layer 20
x: eXit to previous menu
project> v
Running Verify register tests
input = c57b29a7, output = 13355779 - FAIL
input = 2b3d0954, output = 13355779 - FAIL
input = 02354ca2, output = 13355779 - FAIL
input = 3727d2b9, output = 13355779 - FAIL
input = 7963b9c5, output = 13355779 - FAIL
input = e3c5de35, output = 13355779 - FAIL
input = f2886d5b, output = 13355779 - FAIL
input = 46f99ee6, output = 13355779 - FAIL
input = b1257a48, output = 13355779 - FAIL
input = d8184f02, output = 13355779 - FAIL
FAIL: 10 failures out of 10 tests
---
I ran with slimopt+cfu, default toolchain, on proto2, with the only change that I specified the 7- grade part LIFCL-17-7UWG72C
.
With this run, I matched @alanvgreen 's incorrect result exactly, 0x13355779
.
Some other experiments I ran --
Results -- always a similar error, an LSB of a hex digit being 1'b1 when it should be 1'b0.
Another experiment -- I tried to reproduce on Fomu with the same CPU/CFU, and a similar tool chain -- I did not see the error with Fomu.
I did another experiment with Yosys synthesis and Radiant PnR. The results are even stranger.
make PLATFORM=hps EXTRA_LITEX_ARGS="--cpu-variant=slimopt+cfu --toolchain=radiant --synth-mode=yosys" clean bitstream
Output for interactive test:
Running Run interactive tests
CFU Interactive Test:
First operand value > 0
Second operand value > 0
cfu_op0(00000000, 00000000) = 13355009 (hex), 322261001 (signed), 322261001 (unsigned)
cfu_op1(00000000, 00000000) = 13355679 (hex), 322262649 (signed), 322262649 (unsigned)
cfu_op2(00000000, 00000000) = 13355009 (hex), 322261001 (signed), 322261001 (unsigned)
cfu_op3(00000000, 00000000) = 13355679 (hex), 322262649 (signed), 322262649 (unsigned)
cfu_op4(00000000, 00000000) = 13355009 (hex), 322261001 (signed), 322261001 (unsigned)
cfu_op5(00000000, 00000000) = 13355679 (hex), 322262649 (signed), 322262649 (unsigned)
cfu_op6(00000000, 00000000) = 13355009 (hex), 322261001 (signed), 322261001 (unsigned)
cfu_op7(00000000, 00000000) = 13355679 (hex), 322262649 (signed), 322262649 (unsigned)
So with the CFU outputting constant 0x12345678, the output as read by the CPU alternates between two bad values -- and now we sometime see bits incorrectly forced to 1'b0 (before we only saw bits being corrupted to 1'b1). The alternation is not just random -- in all the tests, every successive result flips to the other one.
I ran Radiant again full flow (2.2.1), and saw correct behavior.
Another clue: with the full Oxide flow (yosys/nextpnr-nexus) on this design, I get the correct behavior if I add -noflatten
to the arguments to synth_nexus
in the Yosys script hps_proto2_platform.ys
.
So I suspect that constant propagation / folding from the CFU constant value into the VexRiscv receiving FIFO, particularly across flop boundaries, triggers the issue. This receiving FIFO only exists in the slimopt
variant.
I have a minimal Cfu and with SlimOptCfu on HPS Proto2, the result appears to be always wrong:
Every result received by the CPU is 0x13355779
This was the nmigen used to generate the Cfu.
The 'f' signal exists to ensure that a clk signal is generated.
I haven't had time to isolate the bug better yet, but the above failing code is available at https://github.com/alanvgreen/CFU-Playground/blob/slimopt-bug/proj/hps_accel/gateware/gen2/hps_cfu.py