chipsalliance / Cores-VeeR-EL2

VeeR EL2 Core
https://chipsalliance.github.io/Cores-VeeR-EL2/html/
Apache License 2.0
247 stars 74 forks source link

[Bug report] fence.i may cause the CPU to hang under some conditions #128

Closed flaviens closed 9 months ago

flaviens commented 11 months ago

Hi there!

I think I found a bug (using the brand new Cascade fuzzer) in EL2.

Looking at the reduced ELFs, it looks like it is due to a fence.i placed after branches.

In this zip, you'll find reduced test cases and their dumps for convenience. bug_diff.zip

In the image, left is smallest buggy, right is largest non-buggy. image

I'm using the AHB config. There's also a possibility that my parameters are wrong, I'm absolutely not familiar with this core at the moment. Can you please try to reproduce?

Thanks!

algrobman commented 10 months ago

any more details?

algrobman commented 10 months ago

I was able to simulate the buggy code:

core went to address 0 after fence.i : (that shouldn't happen! Need to see what's wrong in RTL )

       272 :                               s7=93d9de2b                ; nbL
       275 :      #48 0 800000bc 4d5481ef  gp=800000c0                ; jal     gp,0x80048d90
       280 :      #49 0 80048d90 b0a35f63                             ; bge     t1,a0,0x800480ae
       286 :      #50 0 80048d94 783190e3                             ; bne     gp,gp,0x80049d14
       292 :      #51 0 80048d98 00019b0f                             ; fence.i
       302 :      #52 0 00000000 00000000                             ; .short  0
       309 :      #53 0 00000000 00000000                             ; .short  0
       316 :      #54 0 00000000 00000000                             ; .short  0

since this is invalid instruction at address 0 , CPU takes exception. mtvec is not programed and has reset value of 0 causing CPU to go to address 0 again and thus the core is looping at address 0.

algrobman commented 10 months ago

More update:

00019b0f is NOT fence.i, but unimplemented opcode for EL2 - so the CPU takes exception on it. fence.i opcode is 0000100f.

dasm in TB does not decode all the bits in opcode : it takes in account only bits 6:0 and 12, assuming that only valid opcodes are in the tests binaries,

I would suggest TB to print that the core took exception/interrupt in the exec.log to make easier to debug such problems ... Core trace port provides indications what interrupt or exception was taken ....

mkurc-ant commented 10 months ago

So actually opcode 0x00019b0f is fence.i but with non-zero supplementary arguments. RISC-V spec allows both fence and fence.i to have arguments and those are neither decoded nor utilized by VeeR.

The instruction decoder assumes that all those arguments are zero. With the PR https://github.com/chipsalliance/Cores-VeeR-EL2/pull/138 I updated it so that it allows them to have non-zero values. With that VeeR defaults to the most generic behavior of the instructions by ignoring the arguments and does not execute any exception. This is sanctioned by the spec.

tmichalak commented 9 months ago

Fixed with #138. Closing.