alexforencich / verilog-axi

Verilog AXI components for FPGA implementation
MIT License
1.49k stars 447 forks source link

axi_dma_wr seems to give done status before fully writing to DDR? #80

Open abarajithan11 opened 3 months ago

abarajithan11 commented 3 months ago

On Zynq Ultrascale (ZCU 104), we are using axi_dma_wr to write to the off-chip DDR. We set a register when m_axis_write_desc_status_valid & (m_axis_write_desc_status_error !=0). From the C firmware running on the PS, we check for this register in a while loop, then flush the cache, and start processing the data in the DDR.

We found that doing this results in the final output being wrong. When we added sleep_us(0) after the while loop checking for that register, the final output is correct.

Therefore, it seems the DMA gives the done status before data gets fully written into off-chip DDR. But in #13 you mention:

However, the DMA write module should not indicate that the operation is complete until all of the AXI write responses come back, otherwise you could get into a situation where you are operating on data that is not completely written (ask me how I know that....).

@alexforencich Does this mean the DMA should give the status after the data is fully written?

alexforencich commented 3 months ago

The only place the status output is set is here, in response to bvalid being set: https://github.com/alexforencich/verilog-axi/blob/master/rtl/axi_dma_wr.v#L771 . So yes, it should only indicate that the operation has completed after receiving the AXI write response. Is it possible that the AXI write response is not being generated correctly in your setup?

alexforencich commented 3 months ago

The other possibility could be reordering if multiple AXI operations were issued, but the core currently only uses ID 0 to prevent reordering.

abarajithan11 commented 3 months ago

The AXI master port of the write dma is directly connected to the HP AXI slave port of the zynq PS of zcu104.

Here's our top module: https://github.com/KastnerRG/cgra4ml/blob/DMA_controller_dev/deepsocflow%2Frtl%2Frtl_oc_top.v

From our controller, we tie status_ready = 1. So when (status_valid & !status_error), it should indicate the DMA has finished writing. We set a register when that happens.

In firmware we wait for that register, to process the data. And it doesn't work if we don't add usleep(0) after the register is read.

So, I'm wondering, how does the DMA know all data has gone into the DDR? DMA is connected to PS, and i believe PS has an interconnect inside and other bridges to convert data into DDR right?

Does the AXI slave of PS give a done response ONLY after data has cleared this entire pipeline of interconnects & bridges? Or does it give a response as soon as it has taken in the data to itself?

On Fri, Jul 19, 2024, 2:39 PM Alex Forencich @.***> wrote:

The other possibility could be reordering if multiple AXI operations were issued, but the core currently only uses ID 0 to prevent reordering.

— Reply to this email directly, view it on GitHub https://github.com/alexforencich/verilog-axi/issues/80#issuecomment-2240238736, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJGPJMF7EMIJKATN35ZI2DZNGBQ7AVCNFSM6AAAAABK53M2U6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBQGIZTQNZTGY . You are receiving this because you authored the thread.Message ID: @.***>

alexforencich commented 3 months ago

The only thing the DMA module can look at is the B channel which carries the write response. What's supposed to happen is the write request and write data (AW and W channels) propagate the request through the interconnect to the destination (memory controller), then the destination responds with the write response that gets routed back to the DMA engine. In this way, the DMA engine will report that the operation is complete when it receives the write response. But it's possible that there is some intermediate component that generates the write response before the operation actually arrives at the memory controller, perhaps a cache or something similar. In this case, it is impossible for the DMA module to know when the operation has actually reached the memory controller. I thought the Zynq PS would work in this way and report that the write operation is complete only after the memory controller has accepted it, but I could be mistaken. And if the PS does not work in this way, then I'm not sure what the solution is.

alexforencich commented 3 months ago

Also, don't forget that on the Zynq MPSoC, the burst length is limited to 16, so you'll need to make sure that the AXI_MAX_BURST_LEN is set to 16 or less. I'm not sure what happens if this limit is exceeded - with Corundum, it seems to still work, but maybe write responses are not reported correctly.

abarajithan11 commented 3 months ago

Great. Thanks a lot. We'll check this.

On Fri, Jul 19, 2024, 4:57 PM Alex Forencich @.***> wrote:

Also, don't forget that on the Zynq MPSoC, the burst length is limited to 16, so you'll need to make sure that the AXI_MAX_BURST_LEN is set to 16 or less. I'm not sure what happens if this limit is exceeded - with Corundum, it seems to still work, but maybe write responses are not reported correctly.

— Reply to this email directly, view it on GitHub https://github.com/alexforencich/verilog-axi/issues/80#issuecomment-2240769124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJGPJKSSBUFYHK2ASZC3TLZNGRXXAVCNFSM6AAAAABK53M2U6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBQG43DSMJSGQ . You are receiving this because you authored the thread.Message ID: @.***>