bperez77 / xilinx_axidma

A zero-copy Linux driver and a userspace interface library for Xilinx's AXI DMA and VDMA IP blocks. These serve as bridges for communication between the processing system and FPGA programmable logic fabric, through one of the DMA ports on the Zynq processing system. Distributed under the MIT License.
MIT License
464 stars 227 forks source link

xilinx-dma 40400000.dma: Channel ded141d0 has errors 11 cdr 0 cdr msb 0 tdr 0 tdr msb 0 #18

Closed yujianwu closed 7 years ago

yujianwu commented 7 years ago

Hi Brandon, I send data from PS to the PL side by "axidma_oneway_transfer()", then the PS side process do nothing but waitting for an interrupt generate by gpio ip core in PL side, which indicate the data just send from the PS side have done with some process and is ready to send back to the PS. Once the PS side capture the interrupt signal, it will use "axidma_oneway_transfer()" to read the data from PL, but the data length after PL process is not equal the length send from PS. A error message printed each time the PS done one packet loopback(send to PL and read back),which is "xilinx-dma 40400000.dma: Channel ded141d0 has errors 11 cdr 0 cdr msb 0 tdr 0 tdr msb 0". Yesterday i change the program in PL side to make the data length equal to the length send from PS,but the error still appear. Do you have any idea why this error appear? If this error can be safely ignored,how could i make the zed do not print the error message? Thanks very much fo your great project.

bperez77 commented 7 years ago

The "stop short" error can actually be safely ignored, the interrupt is still being delivered. However, if you want to suppress that message, then you should pass a different length to the axidma_oneway_transfer function that exactly matches the amount of data you expect to receive. This is controlled by the len parameter to that function.

ajrodgon commented 6 years ago

Hi Brandon, I have a question regarding to this topic. I am sending (say for example 1000 int) to the DMA and I expect to receive a different amount of data (say 20, for example), just as it's programmed on the FPGA. I am using the function axidma_twoway_transfer. If I put the respective lenght to the trasnmit and receive buffer, I get the message of this post. However, if I set the lenght of the data receive to the same as the data transmited, the message does not appear. So, wich is the data lenght that I should expect from the receive buffer? The one I set on my FPGA design or is there something I am missing?

Recently I have noticed that the data I am receiving is the same that I am transmiting, so maybe that is why I have to set the lenght of the receive buffer to the same value as the transmited buffer. The problem now is, why is this happening? How should I define the receive buffer properly?

Thank you and sorry for being a bit annoying :(

Tom

bperez77 commented 6 years ago

It's hard to say without knowing the specifics of your custom IP that you're using on the FPGA. This message should only appear when you send less data than expected. There shouldn't be any issues with having different lenghts for the transmit and receive transactions.

If I had to guess, I would say that it is likely an issue on the FPGA side. The amount of data you send back to the ARM core is controlled by when you assert the TLAST signal. The amount of data you expect to send and receive is defined by you (or whatever IP you're using on the FPGA side), so there really isn't any expected amount per se.

If you don't mind me asking, what IP do you have connected to the AXI DMA IP? The fact that your transmit and receive data is identical is a bit suspicious to me; that might be an indiciation that your IP is not behaving as you expect. If you're expecting to receive the same data as you sent (e.g. as with the loopback example), then the receive and transmit lengths should match.

Note that the error message is printed by the Xilinx driver, so it comes from the kernel log. The kernel log is only printed to the console for serial console sessions. To suppress the message, you could instead SSH into your Zedboard. Alternately, you can suppress these messages from the serial console session by running dmesg -n 1. This will only print kernel log messages with a level of "Critical" to the shell. Naturally, you can change that number to control the log level.

ajrodgon commented 6 years ago

Hi Brandon,

Thanks for your reply.

My design is just the same as for the loopback example, except that I have my own IP (made with vivado hls) between the loopback. So my IP is the only difference between the loopback example. However, I had tested it in a baremetal application and it worked.

For what you say, the problem is on my side. I am wondering then if it could be a problem related with waiting to the TLAST signal from my IP (actually, I did not specified anything in my code related with the interrupt of my IP). I don't know how this could be done, and if this could be the preoblem neither. But if the data from my IP is not ready, could be possible that the driver reads from the dma the same data that I sent? Or is implicit on DMA transactions that only when the data is received in the port, the data can be read? For my knowledge, this sounds weird to me because the receive and transmit buffers don't have the same address on the dma, but I am new to this and maybe this is not true.

Thank you again for your help.

Tom

bperez77 commented 6 years ago

Ok, yeah having an HLS application makes this a bit simpler. So, oftentimes people will test their code on baremetal, but most of the time (correct me if I'm wrong in this case) they use polling mode instead of interrupts. So, they are operating in different modes.

So, the HLS code "just works" for the most part, so you don't really need to concern yourself with synchronization or other low-level details outside of your HLS module. The driver doesn't read data until the TLAST signal is asserted (your baremetal code works the same way as well) to avoid that issue. The AXI DMA IP handles all of the synchronization in terms of memory accesses and streaming data. The reason the DMA buffers don't have the same address as in your user program is because of virtual memory. The FPGA only utilizes physical address, so the addresses you're seeing are the "actual" addresses.

It's hard to say what the source of this issue is just based on this information. My guess is that it's still resulting from the length of the receive buffer being specified incorrectly. Would it possible to share your HLS code and C application? That would greatly aid in debugging. If you're not comfortable sharing it on this issue, feel free to email me at the address on my Github profile.

ajrodgon commented 6 years ago

Thaks for your reply.

So it makes sense to me, because for my baremetal application I followed the tutorial from xilinx (xapp1170) and I did not see anything about interrupts, so I guess that it is in polling mode.

I attach my files in this .zip. I tried to clean the code in order to make it easier to read. If you think that something is missing, I will send it too. I must say thanks again for your help.

Tom

files_tom.zip

bperez77 commented 6 years ago

I think I have found the issue in your application code. I'm assuming that you're running the application with all the default arguments. As you're solving 26 simultaneous linear equations, you should expect 26 values back in your receive buffer, as you mention in the README. However, you declare the receive size as too large of a value on line 54 of main.cc:

#define RECEIVE_SIZE                ((int)(26*27)*sizeof(float))

Instead, your size should only be for 26 floating point values, so the new definition should be:

#define RECEIVE_SIZE                ((int)26*sizeof(float))

I think that change should solve your issue. By the way, the assertion of the TLAST signal happens on line 194 of solve.h:

out_stream[k] = push_stream<T,U,TI,TD>(out[i][0],k == (26-1));
ajrodgon commented 6 years ago

Oh, I leave that size of the receive buffer because when I put the 26 float size, the message of error appears. With the "wrong" size, the message disapear but the values were the same as the ones I transmit. However, on monday I will check it again just in case.

Thank you again for your effort. I will write again with the results.

Tom

bperez77 commented 6 years ago

I see, interesting. In both of the case (with each of the different RECEIVE_SIZE values), are you getting correct results?

bperez77 commented 6 years ago

Ahh I think I might know the issues, it stems from the typedefinition on line 11 of solve.h:

typedef ap_axiu<64,4,5,5> AXI_VAL;

So the AXI4-Stream structure you're using has 64-bit double data elements, not 32-bit float data elements. Thus, you need to adjust your receive size accordingly:

#define RECEIVE_SIZE                ((int)26*sizeof(double))

Alterantely, you can update your hardware instead so that it works with 32-bit float values. However, this would also require updating your AXI DMA IP so it streams out 32-bit elements as well.

ajrodgon commented 6 years ago

Hi again Brandon,

In both cases I receive the same data that I transmitted. If I set the size of the receive buffer to 26, the data received is the first column of the matrix transmited (the first 26 values I sent) with the kernel error and with the receive size of 26x27, I receive the same data that I transmited, but without errors.

I tried to change the float to double and also to update the hardware and, in both cases, anything change.

I will try to restart again with a simple project, to see if I can make it work.

Thank you again for your time and help. I will update with the results.

Tom

bperez77 commented 6 years ago

You'll only need to update the hardware to use 32-bit values, or change the RECEIVE_SIZE to use double, but not both. It's one or the other. I'd just go with changing to double for RECEIVE_SIZE, it's much simpler.

ajrodgon commented 6 years ago

Yeah, maybe I did not explain it well. I meant I tried both options (not at the same time), and nothing changes.

ajrodgon commented 6 years ago

Hi again Brandon,

I tried again with a simpler example (matrix of 3x3 that is multiplied by 5 in the PL and then is sent back to the PS) and the data received is the same that I have transmitted. It is like if my FPGA configuration is not properly loaded in the device and that I am doing like the loopback example all the time.

I know that now this does not have anything to do with your driver, but I have one doubt. For loading the FPGA design, I do in the Linux terminal "petalinux-config --get-hw-description", and the in the device tree I see the entry in the pl.dtsi related to my hls design. Is this enough for loading my configuration to the FPGA or should I do something else (i.e activate something in the kernel configuration)? I am a bit stuck.

Edit: I was not updating the FPGA properly, so it was always the loopback example. I am so sorry. Now I have a timeout in the receive channel, but that is another battle. Where should I look? I tried the same Vivado design that I am using, but without my own IP (as a loopback) and it works, but when I add my IP in the design, the timeout appears. As before, this works in baremetal and the code I used in the hls is similar to the one I sent previously in this post. I think that I miss something in all the process of migrate from baremetal to petalinux...

Thank you once again. I am so sorry for all the time lost :(

Tom

bperez77 commented 6 years ago

Ahh I see, that explains why you were always seeing the same data after you sent data to the FPGA. And no worries, one of the biggest issues is juggling all these different things together; there's so many moving parts that it can be easy to lose track of things.

So, since the loopback example worked for you with your custom app, that means the issue almost certainly must lie in your custom IP. We know that interrupts are being received, as the loopback worked properly, so that narrows it down to there. Additionally, although your baremetal is likely in polling mode, since it works, that also means that your tlast signal is being properly asserted.

Actually, looking over your code, I'm almost certain of what the issue is. It stems from line 12 in hw_accel.cpp:

#pragma HLS INTERFACE s_axilite port=return     bundle=CONTROL_BUS

This pragma adds two protocol signals, ap_start and ap_done, to you design. The first controls when the module is "called", it is the ready signal for the module. The second is the signal that the module asserts whenever the module "returns", or it is the done signal for the module. However, this two signals are actually unncessary with this module, because this is handled by using the AXI4-Stream protocol, which has the corresponding signals. You module will start running when a new transfer is initiated, and it will assert tlast when it is finished.

In your firmware code, you likely call a function called XSolve_Start, and loop on the condition !XSolve_IsDone(). The problem with using AXI4-Lite when you're running with an OS is that virutal memory is enabled, so the address corresponding to the AXI4-Lite MMIO registers are not the physical address like you use in the bare-metal application.

So, I'd recommend you take out that pragma, and only use the AXI4-Stream protocol. You'll naturally need to update your bare-metal application as well.

ajrodgon commented 6 years ago

It works!!!! Thanks a lot Brandon :)

I tried to solve also an system of 26 linear equations and it worked too, so yeah, the driver is great and all the problems came (in my case) for a not enough knowledge from my side.

However, to your solution, it was not enough with deleting that pragma. Doing that, the control of the IP changes to an ap_ctrl entry instead of the AXI4-Lite, so the ap_start and ap_done were still there. For deleting those control signals, I need to add this pragma:

#pragma HLS INTERFACE ap_ctrl_none port=return

Doing that, my custom IP does not depend on extra control signals and is all controled by the AXI4-Stream protocol, as you said. I hope this could be useful for others who face the same issue (specially when it works in baremetal and not in Linux).

Thank you very much again Brandon, for your help and patience with my issue, and also for sharing this driver.

Tom

bperez77 commented 6 years ago

Great! Glad to hear it works. Thanks for updating with the actual solution as well. I forgot ap_ctrl_none, that's right.