SpinalHDL / NaxRiscv

MIT License
255 stars 39 forks source link

about the gen.scala and ifetch #119

Open duanjiulon opened 1 month ago

duanjiulon commented 1 month ago

![Uploading 0fdd12cd-e038-4a25-b084-1f83504b4812.png…]() Hi, recently I have been using your nax gen.scala file to generate a core with three axi interfaces, ibus,dbus,pbus, After connecting another axicrossbar to an axi system, the anchipram was set to 0 and the DDR address was 0x40000000. The following problem occurred during axi debugging:

  1. By loading the address of 0x0 through openocd+gdb, single step debugging can be successfully performed
  2. When loading the DDR address containing 0x40000000 through openocd+gdb, the following error message will appear
    Loading section .init, size 0x6e lma 0x40000000
    Loading section .text, size 0x16fc lma 0x40000070
    Loading section .data, size 0x858 lma 0x4000176c
    Start address 0x40000000, load size 8130
    Transfer rate: 63 KB/sec, 2710 bytes/write.
    (gdb) si
    unable to resume hart 0
    dmstatus =0x00430c82
    was stepping, halting
    unable to halt hart 0
    dmcontrol=0x00000001
    dmstatus =0x00430c82
    Hart was not halted after single step!
    unable to step rtos hart

    By capturing the waveform signal, it can be preliminarily determined that IFetc failed at address 0x40000000 4A2D549A-3065-4A61-8777-2C704E7860FD you can look at this picture in the gen.scala, I set

    def plugins = {
    val l = Config1.plugins(
      withRdTime = true,
      aluCount    = 2,
      decodeCount = 2,
      debugTriggers = 4,
      withDedicatedLoadAgu = true,
      withRvc = true,
      withLoadStore = true,
      withMmu = true,
      withDebug = true,
      withEmbeddedJtagTap = false,
      jtagTunneled = false,
      withFloat = false,
      withDouble = false,
      withLsu2 = true,
      lqSize = 16,
      sqSize = 16,
            withCoherency = true,
      ioRange = a => a(31 downto 28) === 0xf// || !a(12)//(a(5, 6 bits) ^ a(12, 6 bits)) === 51
    )

    May I ask which parameter caused the issue with this fetch,thanks

duanjiulon commented 1 month ago

60C5D223-9405-47B0-982E-06B03B26B0B1 0fdd12cd-e038-4a25-b084-1f83504b4812

Dolu1990 commented 1 month ago

Hi,

There is also the fetchRange which can matter, but by default it is : fetchRange : UInt => Bool = _(31 downto 28) =/= 0x1,

Which should be fine.

Did you tried first the openocd telnet ? trying to use it to load binaries into memory and reading them back.

Also, one important step, is to run everything in simulation (including jtag / openocd), that way we aren't facing a blackbox to debug. What are you using to run simulation in general ?

duanjiulon commented 1 month ago

I don't know how to use simulation, after all, this one loads its own IP core related things when it's loaded onto the board, and I haven't seen the mode of simulating while debugging with jtag anywhere except on Verilator. This SOC development is detached from the Litex environment. This has also caused me great difficulties。

段玖龙 @.***

 

------------------ 原始邮件 ------------------ 发件人: "SpinalHDL/NaxRiscv" @.>; 发送时间: 2024年7月17日(星期三) 晚上8:38 @.>; @.**@.>; 主题: Re: [SpinalHDL/NaxRiscv] about the gen.scala and ifetch (Issue #119)

Hi,

There is also the fetchRange which can matter, but by default it is : fetchRange : UInt => Bool = _(31 downto 28) =/= 0x1,

Which should be fine.

Did you tried first the openocd telnet ? trying to use it to load binaries into memory and reading them back.

Also, one important step, is to run everything in simulation (including jtag / openocd), that way we aren't facing a blackbox to debug. What are you using to run simulation in general ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Dolu1990 commented 1 month ago

To reduce bug possibilities, maybe try with : aluCount = 1, decodeCount = 1, debugTriggers = 0, withDedicatedLoadAgu = true, withRvc = false,

This will reduce the CPU complexity and maybe workaround a potential bug in the debug module ?

duanjiulon commented 1 month ago

Did you tried first the openocd telnet ? trying to use it to load binaries into memory and reading them back.

  1. I have tried your method, and after loading through GDB, the openocd terminal can read the corresponding hexadecimal data through the mdw command. Loading is done through dbus, but as soon as I start running or step through debugging, the core will freeze directly. However, running the program on the address of onchipram does not have this problem, but running the program on the address of DDR will cause this problem, so I suspect it is an issue with ifetch. I also directly used the default settings for fetch range without making any modifications.
  2. The cores generated through the Litex command line all have the 'coherent' option enabled by default. Can we disable it before generating Litex_soc
duanjiulon commented 1 month ago

To reduce bug possibilities, maybe try with : aluCount = 1, decodeCount = 1, debugTriggers = 0, withDedicatedLoadAgu = true, withRvc = false,

This will reduce the CPU complexity and maybe workaround a potential bug in the debug module ?

Hi, Recently, I have successfully embedded Nax_come into SOC systems without memory consistency, and there are no issues with debugging peripherals. However, I have encountered a problem recently where a timer is required to run drystone programs on this SOC system. I have looked at some NAX_CORE systems that have a timer input, which is:

input  wire [63:0]    PrivilegedPlugin_io_rdtime

assign _zz_EU0_CsrAccessPlugin_logic_fsm_readLogic_csrValue_17[31 : 0] = PrivilegedPlugin_io_rdtime[31 : 0];
assign _zz_EU0_CsrAccessPlugin_logic_fsm_readLogic_csrValue_18[31 : 0] = PrivilegedPlugin_io_rdtime[63 : 32];

May I ask what command I need to use to successfully read this timer?

duanjiulon commented 1 month ago

Hi,

There is also the fetchRange which can matter, but by default it is : fetchRange : UInt => Bool = _(31 downto 28) =/= 0x1,

Which should be fine.

Did you tried first the openocd telnet ? trying to use it to load binaries into memory and reading them back.

Also, one important step, is to run everything in simulation (including jtag / openocd), that way we aren't facing a blackbox to debug. What are you using to run simulation in general ?

Regarding the issue of abnormal program loading earlier, I found through an online logic analyzer that after GDB loads the program, there will be an authorization failure when reading, and the reason for the failure is that Dcached has an abnormal writable backup. After loading DDR, the program cannot be pulled down normally, so I directly set this signal to a low level and the program loading (dbus) and finger retrieval (ibus) will be fine. Is there anything else to pay attention to here?

Dolu1990 commented 1 month ago

rdtime

to access it you can use the CSR defined in gcc as "time" https://github.com/SpinalHDL/NaxSoftware/blob/c63c0ce9311160a7965637fb7de5899c3a5110b8/baremetal/driver/riscv.h#L96 => x = csr_read(time); shouls be ok.

Else you can use the cycle counter : x = csr_read(cycle);

there will be an authorization failure when reading

What kind of authorization failure ?

abnormal writable backup pulled down normally

What do you mean ?

duanjiulon commented 1 month ago

What do you mean ?

you can see this one: After I loaded the program into the DDR where 0x40000000 is located through jtag, the signal DataCachePluginloggic_cache_iw_ritebackBusy remained in a high state. By using commands such as mdw to check the data in the memory at that address, it was successfully loaded. Correspondingly, if the program is loaded into ROM through jtag, it is the result of the second image. write_back_busy_1 write_back_busy_2

Dolu1990 commented 1 month ago

Ahhh i forgot you were in litex.

writeback_slot_1_valid being stuck high is realy weird, as it is quite decoupled from all the rest of the CPU, it is rely at the border toward the memory system.

One thing, did you let the litex bios calibrate the dram after the reset and after starting openocd ?

duanjiulon commented 1 month ago

Ahhh i forgot you were in litex.

writeback_slot_1_valid being stuck high is realy weird, as it is quite decoupled from all the rest of the CPU, it is rely at the border toward the memory system.

One thing, did you let the litex bios calibrate the dram after the reset and after starting openocd ?

No, no, no, this is my own SOC, which has already separated from Litex and lacks memory consistency. It's just a Dcache to axi4 interface connected to axifabric, and then connected to a DDR peripheral, so there's no BIOS

Dolu1990 commented 1 month ago

Ahhh then the issue is likely the source / id handeling in the memory interconnect. Can you probe the various cache/io_mem valid+ready+id ? There is a good chance that the memory interconnect give back the wrong ID as a response.

duanjiulon commented 1 month ago

Ahhh then the issue is likely the source / id handeling in the memory interconnect. Can you probe the various cache/io_mem valid+ready+id ? There is a good chance that the memory interconnect give back the wrong ID as a response.

This is a result after GDB loads the binary file: write_back_busy_3

Dolu1990 commented 1 month ago

Hi,

You also need the rsp_id for read and writes.

duanjiulon commented 1 month ago

sp_id write_back_busy_4 while load on the on chip ram(0x0 start) write_back_busy_5

Dolu1990 commented 1 month ago

How did you hoocked the SoC to the external memory controller ? Maybe the issue is there ? Bad ID handeling ?

Dolu1990 commented 1 month ago

I think to be sure on which side the issue is, we would need to look at the AXI signals (valid / ready / id) with the logic analyser. Aswell as on the CPU side simultaneusly.

duanjiulon commented 1 month ago

I think to be sure on which side the issue is, we would need to look at the AXI signals (valid / ready / id) with the logic analyser. Aswell as on the CPU side simultaneusly. Dear Dolu,I want to use verialtor to simulate this, but I also want to reproduce the JTAG scene at that time, that is, I want to load binaries in the simulation with JTAG to generate waveforms, how should I use the resources in existing git.like this: the file mt48lc16m16a2 is used in your spinal git file.But I tried it, varilator doesn't compile this simulation file of sdram very well, how did you test it?

  wire [10:0] io_sdram_ADDR;
  wire [1:0] io_sdram_BA;
  wire [31:0] io_sdram_DQ;
  wire [31:0] io_sdram_DQ_read;
  wire [31:0] io_sdram_DQ_write;
  wire  io_sdram_DQ_writeEnable;
  wire [1:0] io_sdram_DQM;
  wire  io_sdram_CASn;
  wire  io_sdram_CKE;
  wire  io_sdram_CSn;
  wire  io_sdram_RASn;
  wire  io_sdram_WEn;

  assign io_sdram_DQ_read = io_sdram_DQ;
  assign io_sdram_DQ = io_sdram_DQ_writeEnable ? io_sdram_DQ_write : 32'bZZZZZZZZZZZZZZZZ;

  mt48lc16m16a2 sdram(
    .Dq(io_sdram_DQ),
    .Addr(io_sdram_ADDR),
    .Ba(io_sdram_BA),
    .Clk(soc_clk),
    .Cke(io_sdram_CKE),
    .Cs_n(io_sdram_CSn),
    .Ras_n(io_sdram_RASn),
    .Cas_n(io_sdram_CASn),
    .We_n(io_sdram_WEn),
    .Dqm(io_sdram_DQM)
  );

AlSdrDdrSoC SoC_AXI_Fabric(
    .io_asyncReset(sysrst),
    .io_axiClk(soc_clk),
    .io_coreInstrAxi_ar_valid(io_coreInstrAxi_ar_valid),
    .io_coreInstrAxi_ar_ready(io_coreInstrAxi_ar_ready),
    .io_coreInstrAxi_ar_payload_addr(io_coreInstrAxi_ar_payload_addr),
    .io_coreInstrAxi_ar_payload_id(io_coreInstrAxi_ar_payload_id),
    .io_coreInstrAxi_ar_payload_len(io_coreInstrAxi_ar_payload_len),
    .io_coreInstrAxi_ar_payload_size(io_coreInstrAxi_ar_payload_size),
    .io_coreInstrAxi_ar_payload_burst(io_coreInstrAxi_ar_payload_burst),
    .io_coreInstrAxi_r_valid(io_coreInstrAxi_r_valid),
    .io_coreInstrAxi_r_ready(io_coreInstrAxi_r_ready),
    .io_coreInstrAxi_r_payload_data(io_coreInstrAxi_r_payload_data),
    .io_coreInstrAxi_r_payload_id(io_coreInstrAxi_r_payload_id),
    .io_coreInstrAxi_r_payload_resp(io_coreInstrAxi_r_payload_resp),
    .io_coreInstrAxi_r_payload_last(io_coreInstrAxi_r_payload_last),
    .io_coreDataAxi_aw_valid(io_coreDataAxi_aw_valid),
    .io_coreDataAxi_aw_ready(io_coreDataAxi_aw_ready),
    .io_coreDataAxi_aw_payload_addr(io_coreDataAxi_aw_payload_addr),
    .io_coreDataAxi_aw_payload_id(io_coreDataAxi_aw_payload_id),
    .io_coreDataAxi_aw_payload_len(io_coreDataAxi_aw_payload_len),
    .io_coreDataAxi_aw_payload_size(io_coreDataAxi_aw_payload_size),
    .io_coreDataAxi_aw_payload_burst(io_coreDataAxi_aw_payload_burst),
    .io_coreDataAxi_w_valid(io_coreDataAxi_w_valid),
    .io_coreDataAxi_w_ready(io_coreDataAxi_w_ready),
    .io_coreDataAxi_w_payload_data(io_coreDataAxi_w_payload_data),
    .io_coreDataAxi_w_payload_strb(io_coreDataAxi_w_payload_strb),
    .io_coreDataAxi_w_payload_last(io_coreDataAxi_w_payload_last),
    .io_coreDataAxi_b_valid(io_coreDataAxi_b_valid),
    .io_coreDataAxi_b_ready(io_coreDataAxi_b_ready),
    .io_coreDataAxi_b_payload_id(io_coreDataAxi_b_payload_id),
    .io_coreDataAxi_b_payload_resp(io_coreDataAxi_b_payload_resp),
    .io_coreDataAxi_ar_valid(io_coreDataAxi_ar_valid),
    .io_coreDataAxi_ar_ready(io_coreDataAxi_ar_ready),
    .io_coreDataAxi_ar_payload_addr(io_coreDataAxi_ar_payload_addr),
    .io_coreDataAxi_ar_payload_id(io_coreDataAxi_ar_payload_id),
    .io_coreDataAxi_ar_payload_len(io_coreDataAxi_ar_payload_len),
    .io_coreDataAxi_ar_payload_size(io_coreDataAxi_ar_payload_size),
    .io_coreDataAxi_ar_payload_burst(io_coreDataAxi_ar_payload_burst),
    .io_coreDataAxi_r_valid(io_coreDataAxi_r_valid),
    .io_coreDataAxi_r_ready(io_coreDataAxi_r_ready),
    .io_coreDataAxi_r_payload_data(io_coreDataAxi_r_payload_data),
    .io_coreDataAxi_r_payload_id(io_coreDataAxi_r_payload_id),
    .io_coreDataAxi_r_payload_resp(io_coreDataAxi_r_payload_resp),
    .io_coreDataAxi_r_payload_last(io_coreDataAxi_r_payload_last),
    //////////////////
    .io_sdram_ADDR(io_sdram_ADDR),
    .io_sdram_BA(io_sdram_BA),
    .io_sdram_DQ_read(io_sdram_DQ_read),
    .io_sdram_DQ_write(io_sdram_DQ_write),
    .io_sdram_DQ_writeEnable(io_sdram_DQ_writeEnable),
    .io_sdram_DQM(io_sdram_DQM),
    .io_sdram_CASn(io_sdram_CASn),
    .io_sdram_CKE(io_sdram_CKE),
    .io_sdram_CSn(io_sdram_CSn),
    .io_sdram_RASn(io_sdram_RASn),
    .io_sdram_WEn(io_sdram_WEn)

);