chipsalliance / rocket-chip

Rocket Chip Generator
Other
3.26k stars 1.13k forks source link

Strange behavior of pipeline when sd/ld instructions sequentially used (single core custom config) #512

Closed sergeykhbr closed 7 years ago

sergeykhbr commented 7 years ago

I'm using custom single core configuration without L1toL2interconnect: RocketTile <=> TLToAXI4 <=> AXI4 interconnect and I see the following wrong behavior (that probably will interest you):

pc disasm reg modification comment
1438 lui a5,0x40 a5 <= OK
143c addi a5,a5,1 a5 <= 0x40001 OK
1440 slli a5,a5,0xd a5 <= 0x80002000 OK
1444 sd a5,-48(s0) mem[] <= 0x80002000 OK. Address = 0x1007fbd0
1448 ld a5,-48(s0) a5 <= 0x1007fbd0 Error. Address was assigned. must be 0x80002000
144c li a4,-1 a4 <= ~0 OK

For me it looks like a bug in pipeline logic but probably L2 cache can properly handle it. Here is the Verilog debug output (rocket.scala, line 687):

# C                   0:        202 [1] pc=[000000143c] W[r15=0000000000040001][1] R[r15=0000000000040000] R[r 1=0000000080000003] inst=[00178793] DASM(00178793)
# C                   0:        203 [0] pc=[0000001440] W[r 0=0000000080002000][0] R[r15=0000000000040001] R[r13=0000000080000003] inst=[00d79793] DASM(00d79793)
# C                   0:        204 [0] pc=[0000001444] W[r 0=000000001007fbd0][0] R[r 8=000000001007fc00] R[r15=0000000080002000] inst=[fcf43823] DASM(fcf43823)
# C                   0:        205 [0] pc=[0000001444] W[r 0=000000001007fbd0][0] R[r 8=000000001007fc00] R[r15=0000000080000003] inst=[fcf43823] DASM(fcf43823)
# C                   0:        206 [0] pc=[0000001444] W[r 0=000000001007fbd0][0] R[r 8=000000001007fc00] R[r15=0000000080000003] inst=[fcf43823] DASM(fcf43823)
# C                   0:        207 [0] pc=[0000001444] W[r 0=000000001007fbd0][0] R[r 8=000000001007fc00] R[r15=0000000080000003] inst=[fcf43823] DASM(fcf43823)
# C                   0:        208 [1] pc=[0000001440] W[r15=0000000080002000][1] R[r15=0000000000040001] R[r13=0000000080000003] inst=[00d79793] DASM(00d79793)
# C                   0:        209 [1] pc=[0000001444] W[r 0=000000001007fbd0][0] R[r 8=000000001007fc00] R[r15=0000000080002000] inst=[fcf43823] DASM(fcf43823)
# C                   0:        210 [0] pc=[0000001448] W[r 0=000000001007fbd0][0] R[r 8=000000001007fc00] R[r16=0000000080000003] inst=[fd043783] DASM(fd043783)
# C                   0:        211 [0] pc=[000000144c] W[r 0=ffffffffffffffff][0] R[r 0=0000000000000000] R[r31=0000000080000003] inst=[fff00713] DASM(fff00713)
# C                   0:        212 [0] pc=[000000144c] W[r 0=ffffffffffffffff][0] R[r 0=0000000080002000] R[r31=ffffffffffffffff] inst=[fff00713] DASM(fff00713)
# C                   0:        213 [0] pc=[000000144c] W[r 0=ffffffffffffffff][0] R[r 0=0000000080002000] R[r31=0000000080002000] inst=[fff00713] DASM(fff00713)
# C                   0:        214 [0] pc=[000000144c] W[r 0=ffffffffffffffff][0] R[r 0=0000000000000000] R[r31=0000000080002000] inst=[fff00713] DASM(fff00713)
# C                   0:        215 [0] pc=[000000144c] W[r 0=ffffffffffffffff][0] R[r 0=0000000000000000] R[r31=0000000080002000] inst=[fff00713] DASM(fff00713)
# C                   0:        216 [0] pc=[000000144c] W[r 0=ffffffffffffffff][0] R[r 0=0000000000000000] R[r31=0000000080002000] inst=[fff00713] DASM(fff00713)
# C                   0:        217 [1] pc=[0000001448] W[r15=000000001007fbd0][1] R[r 8=000000001007fc00] R[r16=0000000080000003] inst=[fd043783] DASM(fd043783)
# C                   0:        218 [1] pc=[000000144c] W[r14=ffffffffffffffff][1] R[r 0=0000000000000000] R[r31=0000000080000003] inst=[fff00713] DASM(fff00713)
# C                   0:        219 [0] pc=[0000001450] W[r 0=0000000000000000][0] R[r15=0000000000000000] R[r14=ffffffffffffffff] inst=[00e7a023] DASM(00e7a023)
# C                   0:        220 [0] pc=[0000001450] W[r15=0000000080002000][1] R[r15=0000000000000000] R[r14=0000000000000000] inst=[00e7a023] DASM(00e7a023)

Pay attention to step 220

PS. All memory ranges have 'rwx' access. Addresses >= 0x80000000 with flag cacheable = 0.

Regards, Sergey

aswaterman commented 7 years ago

If you're referring to the fact that a5 is temporarily assigned 1007fbd0 before later being assigned 80002000, this isn't a bug. Although the intermediate value is wrong, the scoreboard bit for a5 is set at this time, so it's impossible for anyone to read the incorrect value. The scoreboard bit isn't cleared until the load actually completes, writing the correct value to a5.

sergeykhbr commented 7 years ago

I have quite certain problem that's reproduced on fpga and rtl simulation - 'stuck on reading from inconsistent address 0xA'. I compare the pairs register <= write value when signal wb_valid=1 (in square brackets) relative functional model and that was the first difference. If you're saying it's ok, I'll continue comparison.

Thank you, Sergey

aswaterman commented 7 years ago

Can you provide more of the log?

sergeykhbr commented 7 years ago

Please, find the attached log-file. rocket_0xa.txt and dump file

bootimage.txt

aswaterman commented 7 years ago

Yeah, that looks OK to me. You could trigger on wb_wen && !wb_set_sboard to filter out these spurious cases.

sergeykhbr commented 7 years ago

I think I found the problem in the following instructions:

     101 [00001244] fca43c23:  sd      a0,-40(s0)   [1007fb98] <= 00000000000017d0
     102 [00001248] fcb42a23:  sw      a1,-44(s0)   [1007fb94] <= 000000000000000a

Second instruction rewrites data of the first instruction due the write strob forming algorithm.

  1. TileLine forms

    a_address = 1007fb94
    a_mask = f0
    a_data = 0000000a0000000a
  2. AXI bus converter uses the same value in channels that leads to writing [1007fb98] <= 0xa

Solution:

Regards, Sergey

aswaterman commented 7 years ago

@terpstra @hcook

terpstra commented 7 years ago

This looks fine to me. In AXI4 (and TileLink) the lower data-bus bits correspond to the lower addresses (see figure A3-13 in section A3.4.3 of the AXI4 specification). You wrote to 0x1007fb94 on a 64-bit bus. Therefore, the data-bus alignment is 0x1007fb90. So bits 4-7 (the ones set in a_mask) refer to bytes 0x1007fb94-7, which is what your command wanted. Note that RISC-V is little-endian, so the lowest bits go to the lowest address, which is why your '0a' is located at byte 4.

Whatever device that is interpreting this as a write to 0x1007fb98 does not conform to the AXI4 specification, and you should submit this bug report to them.