bu-icsg / dana

Dynamically Allocated Neural Network Accelerator for the RISC-V Rocket Microprocessor in Chisel
Other
203 stars 36 forks source link

Errors for dana/smoke test #49

Open hoangt opened 6 years ago

hoangt commented 6 years ago

Hello seldridge,

1) I built Dana+Rocket flow successfully for both emulator and FPGA. When I run dana/smoke test, all of them failed for unknown reasons to me. Below are logs that I got after running smoke tests

=== Run: emulator-rocketchip-DanaEmulatorConfig smoke/xfiles-dana-smoke-p-csr

FAILED (tohost = 2) FAILED (code = 2, seed 1519650056) after 98224 cycles

=== Run: emulator-rocketchip-DanaEmulatorConfig smoke/xfiles-dana-smoke-p-debug

FAILED (tohost = 4) FAILED (code = 4, seed 1519650095) after 48021 cycles

=== Run: emulator-rocketchip-DanaEmulatorConfig smoke/xfiles-dana-smoke-p-id

FAILED (tohost = 1) FAILED (code = 1, seed 1519650096) after 93184 cycles

Could you give me a hint to debug this issue? Many thanks!

2) When I run dana/nets test, I did not see any output (even pass/fail which you mentioned). For instance, when I used below command to run an individual of nets test, nothing shown in terminak emulator-rocketchip-DanaEmulatorConfig nets/xfiles-dana-nets-p-xor-sigmoid-16i

seldridge commented 6 years ago

I went ahead and merged a fix for your first question above.

The failure of the csr.S and id.S were related to the same thing. Dana defines an "xfid" that provides some information back to the user or supervisor about what Dana looks like (how many transactions it supports, how many multiply accumulate processing elements it has, etc.). One of the tests in csr.S and id.S are looking for an exact match on a hard-coded "xfid". This was out of sync with the default DanaEmulatorConfig. That is now fixed.

The debug.S failure was related to something else (an actual bug). When accessing the debug unit, a valid request was being interpreted as an invalid request and an error code was being incorrectly returned. We haven't used the debug unit much in a while and I guess this slipped through. Thanks for reporting this.

Related to your second question...

This is the expected behavior. The bare metal tests (in tests/smoke and tests/nets) will only provide visible output if they have a failing test (though it is far more likely that a failing test will show up as a hang). The easiest way to get output from those tests (when running on the emulator) is to enable the Verilog printfs. E.g., the usual "hello world" network test that I use is xorSigmoidSymmetric. Properly gprepped, you can reduce this to just the value of the output queues (which are the outputs of the network):

> ./emulator-rocketchip-DanaEmulatorConfig +verbose ../dana/tests/build/nets/xfiles-dana-nets-p-xorSigmoidSymmetric 2>&1 | spike-dasm | grep queueOut | grep deq
[INFO] xfiles.TTable: queueOut[0] deq [data:0x00000000000013ea], #:0d 1
[INFO] xfiles.TTable: queueOut[0] deq [data:0x0000000000001549], #:0d 1
[INFO] xfiles.TTable: queueOut[0] deq [data:0x00000000fffffc31], #:0d 1
[INFO] xfiles.TTable: queueOut[0] deq [data:0x00000000fffffe8c], #:0d 1

We have a hacked up MNIST demo that uses tests/nets/inference.S to return the neuron with the maximum output via the test error code. A more full-fledged demo should, however, use an actual kernel (Proxy Kernel or Linux). I haven't used the proxy kernel for testing in some time, though tests/pk has some outdated tests that I expect no longer work.

hoangt commented 6 years ago

These are very helpful comments which allow me deep dive on Dana. Update: For debug.S, it seems that there are bugs in test cases 4 and 5. If we comment out those test cases, debug test will go well.

TEST_CASE( 4, x10, 0x0, DEBUG_WRITE_UTL(0xcccc, tdat4) );

TEST_CASE( 5, x10, 0xcccc, DEBUG_READ_UTL(tdat5) );