Open shivanishah269 opened 3 years ago
gtkwave choked on this .vcd file.
You might consider making this accessible to Shivam as well. He added M-type instructions.
@stevehoover it seems that gtkwave needs additional libraries. @shivanishah269 tried it on her personal machine. Perhaps Shivam can look at the attached snapshot and the RTL to understand what's going on. Also, I don't have Verilator but perhaps that can be used to read the VCD.
I can open other .vcd's.
Steve, do you mean the other VCDs we have provided, or from other people?
From other tools. I don't have any others from you.
We have uploaded VCDs for other tickets, I was thinking you meant those. I have used Cadence tools in the past to share VCDs with other designers, so I have no idea why GtkWave is playing up on this particular VCD.
If the VCD replay doesn't work for you guys, Shivani and I can set up a session with Jasper running live. In any case, the Jasper snapshot and the description have information that may be useful to start the analysis.
I tried another of your .vcds and it also failed in gtkwave w/ parse errors. Sure we can have a zoom session.
Here is an FSDB file in case you can open it with GtkWave. Meanwhile, we will arrange a Zoom session. [MUL_NEW.fsdb.gz](https://github.com/darbaria/axiomise-warpv-formal-6-stage/files/6569391/MUL_NEW.fsdb.gz)
After talking to @shivampotdar he suggested that we try with EXTRA_MUL_FFS tied to 0 in the design when the call to pcpi module is made. Sadly, the end result is the same, as we still have a failure. Shivam said that when the M-extension was tested with Verilator the design had EXTRA_MUL_FFS tied to 1 (which was the default value in the design extracted from tlv). When this parameter is 1, then the trace we originally supplied shows the design bug.
We attach the new snapshot with the EXTRA_MUL_FFS tied to 0, if it makes any sense to analyse this (we are not sure if the parameter setting is correct when it is 0). Register file index 31 is expected to be updated with the mul.pcpi.rd value but it isn't. mul.pcpi.rd itself is having the correct result.
@stevehoover see the attached waveform. Not much change from before. Expecting 12*2 =24 to be committed in cycle 32 but we see 0. You can see FETCH_INSTR_mulblockrslt_a5 is correct but the update to rst_a6 signal is not happening correctly in the design. The only time rvfi_valid bound to axiomise_instr_valid in our setup has gone high is in cycle 28-29, 30-32. We detect the trigger in cycle 31 expecting an update in cycle 32.
In the following snapshot, you will see the only instructions in the pipe are variants of MUL. Can you create a program trace in your environment where you send in some of these before the MUL? I will try and block these other MULs in another case and let you know.
Okay, now I have blocked everything other than mul and ori although ori doesn't get issued. I still see the same failure. I don't believe the design bug has anything to do with other instructions in flight at least for now. As we have seen the mulblock_rslt_a5 computes the result correctly but the updates don't happen the expected result register rst_a6. This result and the last one is with the design modification you suggested.
MUL_fail_with_everything_blocked.vcd.gz MUL_with_everything_blocked.fsdb.gz
I attach the following:
According to RISCV ISA definition of MUL Instruction is as follows: "MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and places the lower XLEN bits in the destination register rd."
Please see the description below to see what is going on in the attached screenshot:
At the time when MUL is issued at 21st cycle, the contents of register pointed by rs1 and rs2 is 32'd2 and 32'd4 respectively and destination register rd is 5'd4. According to micro-architecture of design, result signals for mul/div are pulled down to zero for the first issue of these instructions and they commit the result of mul/div instruction during the second issue of these signals.
Now, second issue of MUL happens at 30th cycle, at that point also the contents of register pointed by rs1 and rs2 is 32'd2 and 32'd4 respectively and destination register rd is 5'd4. So, now in the next cycle we expect axiomise_regfile[4] to be 32'd8 but, it is updated to 32'd4.
I have tried to backtrack to see what values were actually multiplied in module pcpi_fast_mul. According to my observation, data flow of rs1 and rs2 content is as follows, Design signal(FETCH_Instr_mul_in1/2_a5) --> pcpi_rs1/2 --> mul.rs1/2 --> mul.rs1/2_q. So, inside the mul module when first issue of mul.instr_any_mul happens at 21st cycle so, the multiplication is executed based on contents of rs1_q and rs2_q which are uninitialized values.
@stevehoover and @ahadnagy, correct me if I am wrong in my understanding of how MUL works in WARP-V.
Also, attaching vcd file for your reference.
MUL.vcd.gz