HilscherAutomation / nxdrvlinux

cifX device driver for linux.
https://www.hilscher.com
GNU General Public License v2.0
11 stars 3 forks source link

How to use DMA with xChannelIORead #7

Open juwalter opened 1 week ago

juwalter commented 1 week ago

Hello all,

first off: very cool that nxdrvlinux is on Github now!

Actual question: we have been trying to get xChannelIORead working in DMA mode to see if this will result in lower CPU load for our application. Our use case is fairly simple: we have a 1 millisecond loop that reads 1344 bytes (we use cifX as PROFINET I/O device with 1 millisecond sync domain on the PROFINET network configured by the SIEMENS PLC/CPU) from DP_0 buffer using xChannelIORead and publish the result to a ZMQ socket, so other applications on the same host can consume from it. Using IRQ mode this works, however on our small edge device, one (dedicated, isolcpu) CPU core is pegged at around 70%, and we are hoping to get that down.

We configured DMA mode in both, the driver (uio_netx) when building, and also the library, and also in /opt/cifx/deviceconfig/<nnn>/<mmm>/device.conf (enabledma). However, if we do so, the buffer we are reading via xChannelIORead does not change (we do have a PCI card). Is there some other API call we need to make or configure the driver (maybe modprobe uio_netx sync_dma=on or something along those lines?).

side-question: we researched this but could not understand: is IRQ-mode or DMA-mode more efficient = less load on CPU? We have also tried notification/callback - this works (but needs IRQ mode if I understand correctly), but seems to have no impact on CPU load. In DMA mode CPU drops to below 10%, but that might very well be simply because the buffer does not update ...

Many thanks in advance!!

MTrensch-hilscher commented 4 days ago

Using IRQ mode this works, however on our small edge device, one (dedicated, isolcpu) CPU core is pegged at around 70%, and we are hoping to get that down.

IRQ is usually counter productive as the overhead is worse than busy-waiting for the data. This has also something to do with the DPM layout, as the IRQs do signal that a xChannelIORead/Write has been processed by the firmware, and this time is usually less than IRQ overhead. But that surely depends on the configuration and might also be fieldbus specific.

We configured DMA mode in both, the driver (uio_netx) when building, and also the library, and also in /opt/cifx/deviceconfig///device.conf (enabledma). However, if we do so, the buffer we are reading via xChannelIORead does not change (we do have a PCI card). Is there some other API call we need to make or configure the driver (maybe modprobe uio_netx sync_dma=on or something along those lines?).

There is not anything more to do, but we discovered that recent kernels/machines dislike the use of mapping DMA buffers into userspace and some mappings, especially when IOMMUs are involved, don't work as expected. See https://ticket.hilscher.com/browse/NXDRVLINUX-154 (I just created, as it somehow got lost)

side-question: we researched this but could not understand: is IRQ-mode or DMA-mode more efficient = less load on CPU? We have also tried notification/callback - this works (but needs IRQ mode if I understand correctly), but seems to have no impact on CPU load. In DMA mode CPU drops to below 10%, but that might very well be simply because the buffer does not update ...

That's hard to say and strongly depends on the usage type / application. Usually if you have a cyclic application, IRQ operation is counter-productive as the driver functions would "waste" more time in IRQ waiting that in busy-waiting loops. CPU load may be lower (at least the displayed values) as the cpu load is shifted towartds interrupt and scheduling, which is probably invisible but causing more overhead. DMA mode when using xChannelIORead/Write may save a little CPU as data is being copied from DMA buffers to user buffers, instead of being copied via PCI's memory read function from the device. Using xChannelPLC Functions with DMA could provide more savings, as the application can directly access the DMA buffers (if DMA is working correctly).