Xilinx / dma_ip_drivers

Xilinx QDMA IP Drivers
https://xilinx.github.io/dma_ip_drivers/
526 stars 398 forks source link

XDMA QuickStart Guide Tutorial and/or Wiki #255

Open mwrnd opened 5 months ago

mwrnd commented 5 months ago

It would have been very useful when I started an XDMA-based project to have any kind of notes or a tutorial like that for QDMA. I had to dive into the driver code just to figure out basic usage. Now that the XDMA Driver is in the Linux Kernel please improve the documentation.

I would like to have been able to sit down at a system with Vivado and a PCIe-based card installed and get to a working XDMA-based design that I am confidently able to modify in about two hours.

I have made an attempt at such notes. The associated images take up 1.9MB in color or 925KB in grayscale. I can submit a pull request or help out with the wiki.

Github Settings:

Github Settings

Allows you to enable Wikis or disable Issues, if that is the intention.

Github Wikis Issues

MischaBaars commented 5 months ago

I have made an attempt at such notes

Thanks! :) Cool.

Doesn't look like Xilinx/AMD is doing anything with the pull requests though :(

hinxx commented 5 months ago

These instructions look very nice; even a SW like like me could probably build a FPGA image with them.

Doesn't look like Xilinx/AMD is doing anything with the pull requests though :(

They have not been doing that for as long as I can remember. This repo is more of a one way street and updated very infrequently, but it is their choice. After all this is called the 'reference' implementation of the XDMA driver and one is free not to use it of course if quality sucks for them (I don't know how the QDMA looks like I'm working with XDMA only).

It seems that the way this (XDMA) driver is developed it is never ending up in the linux kernel upstream. I guess one of the reasons for them not being able to get it upstreamed is a custom DMA engine implementation (libxdma.c) and a too narrow use case (userspace char device interfaces only).

What AMD is now trying to do is to only get the DMA engine parts into upstream kernel (ie. https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/tree/drivers/dma/xilinx/xdma.c) as a dmaengine kernel module. This means that none of the char device interfaces that we see here today are being upstreamed. That driver is still not accepted into upstream but at least there is a desire to get it there. If one is interested on how it would be used there exists a video for linux driver that utilizes it (https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/tree/drivers/media/pci/mgb4/mgb4_core.c). I guess the idea is that XDMA dmaengine driver , once upstreamed , can be used with a variety of subsystems and not only expose char devices to userspace.

I've tinkered with that dmaengine and retrofitted it into this XDMA driver as a quick haxx to see what parts get replaced and what it would take to use it. Essentially, almost all of https://github.com/Xilinx/dma_ip_drivers/blob/master/XDMA/linux-kernel/xdma/libxdma.c code can be considered obsolete when the new XDMA dmaengine driver is used. The other parts like interrupt handling code would be slightly different, but char interfaces code pretty much remain the same.

If there are enough knowledgeable folks around here to develop a new char interface driver based on the XDMA dmaengine driver I'm wiling to participate in the efforts and testing on the hardware I have.

jberaud commented 5 months ago

These instructions look very nice; even a SW like like me could probably build a FPGA image with them.

Doesn't look like Xilinx/AMD is doing anything with the pull requests though :(

They have not been doing that for as long as I can remember. This repo is more of a one way street and updated very infrequently, but it is their choice. After all this is called the 'reference' implementation of the XDMA driver and one is free not to use it of course if quality sucks for them (I don't know how the QDMA looks like I'm working with XDMA only).

It seems that the way this (XDMA) driver is developed it is never ending up in the linux kernel upstream. I guess one of the reasons for them not being able to get it upstreamed is a custom DMA engine implementation (libxdma.c) and a too narrow use case (userspace char device interfaces only).

What AMD is now trying to do is to only get the DMA engine parts into upstream kernel (ie. https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/tree/drivers/dma/xilinx/xdma.c) as a dmaengine kernel module. This means that none of the char device interfaces that we see here today are being upstreamed. That driver is still not accepted into upstream but at least there is a desire to get it there. If one is interested on how it would be used there exists a video for linux driver that utilizes it (https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/tree/drivers/media/pci/mgb4/mgb4_core.c). I guess the idea is that XDMA dmaengine driver , once upstreamed , can be used with a variety of subsystems and not only expose char devices to userspace.

I've tinkered with that dmaengine and retrofitted it into this XDMA driver as a quick haxx to see what parts get replaced and what it would take to use it. Essentially, almost all of https://github.com/Xilinx/dma_ip_drivers/blob/master/XDMA/linux-kernel/xdma/libxdma.c code can be considered obsolete when the new XDMA dmaengine driver is used. The other parts like interrupt handling code would be slightly different, but char interfaces code pretty much remain the same.

If there are enough knowledgeable folks around here to develop a new char interface driver based on the XDMA dmaengine driver I'm wiling to participate in the efforts and testing on the hardware I have.

That driver has been accepted upstream https://github.com/torvalds/linux/blob/master/drivers/dma/xilinx/xdma.c

I had replaced this xilinx driver by a custom implementation a while ago for usage in our driver but I plan to rewrite it using the upstream driver in at some point. If nothing happens until then we'll probably release it publicly.

hinxx commented 5 months ago

That driver has been accepted upstream..

Cool, did not notice that, thanks!

MischaBaars commented 5 months ago

I have made an attempt at such notes.

Hi Matthew,

I see that it provides a README.md file in the 'Markdown' plain-text file format. It looks best when pasting it right here, and then pressing 'Preview'. Perhaps you know a better way to make this file readable?

Also, before we dive into fancy DDR4 memory controllers, I was able to change the number of 36K BRAM blocks to 248. Trying to allocate more, results in a complaints from Vivado. There should be 300 available. My guess is that 52 36K BRAM blocks are being used by the Opal Kelly firmware?

But, when I change xdma_mm.sh line 13 (as indicated by Opal Kelly) to either:

1) io_max=$(((36 << 10) *  16))     # 36k BRAM blocks
2) io_max=$(((36 << 10) *  32))     # 36k BRAM blocks
3) io_max=$(((36 << 10) *  64))     # 36k BRAM blocks
4) io_max=$(((36 << 10) * 128))     # 36k BRAM blocks
5) io_max=$(((36 << 10) * 248))     # 36k BRAM blocks

it fails at 2). Any idea why this is happening?

Screenshot from 2024-01-19 11-31-30 (9) Screenshot from 2024-01-19 11-31-33 (9) Screenshot from 2024-01-19 11-39-54 (9) xdma_mm. 16.log xdma_mm. 32.log

mwrnd commented 5 months ago

I'm not sure what you are trying to accomplish but I gave up on the dma_ip_drivers XDMA tests. I believe they are meant for the example design as it compiles for the KCU105.

I was unable to change the number of 36K BRAM blocks to 248 ... There should be 300 available.

There are cascade limits/issues/caveats with built-in memory and some of it is probably being used by other IP internal to the example design. Keep in mind the Block RAM is Distributed so parts of it get used throughout your design and they block the ideal cascading of all of it. About 50 BRAM blocks are required by the XDMA Block.

Do not edit the Block Memory Generator. If you want to create a larger contiguous block of memory, change the Range in the Address Editor. (300×36000)÷8÷1024÷1024~=1.28MB so try setting it to 512K. Implementation will fail if Vivado cannot find the resources. Try different values.

Address Editor M_AXI

dma_ip_drivers tools are a useful way to then test writes and reads between the host and your XDMA design. The tools take into account the 0x7ffff000=2147479552-byte Linux write limit. The following writes to and then reads 8K from a BRAM Block at address 0x80000000. You can set the Block Size (bs=) and Size (--size) to be the same as the Range you set in the Address Editor.

cd dma_ip_drivers/XDMA/linux-kernel/tools/
dd if=/dev/urandom of=TEST bs=8192 count=1
sudo ./dma_to_device   --verbose --device /dev/xdma0_h2c_0 --address 0x80000000 --size 8192  -f    TEST
sudo ./dma_from_device --verbose --device /dev/xdma0_c2h_0 --address 0x80000000 --size 8192 --file RECV
md5sum TEST RECV

XDMA BRAM Test using dma_ip_drivers software

You can also use dd for the same purpose but keep in mind the 0x7ffff000=2147479552-byte Linux write limit. Note dd requires numbers in Base-10 so you can use printf to convert from the hex address, 0x80000000=2147483648. The following writes to and then reads from a 2MB BRAM Block. Note count=1 is passed to dd when communicating with XDMA as this is a single continuous write/read at the given address.

dd if=/dev/urandom of=TEST bs=8192 count=256
printf "%d\n" 0x80000000
sudo dd if=TEST of=/dev/xdma0_h2c_0 bs=2097152 count=1 seek=2147483648 oflag=seek_bytes
sudo dd if=/dev/xdma0_c2h_0 of=RECV bs=2097152 count=1 skip=2147483648 iflag=skip_bytes
md5sum TEST RECV

XDMA BRAM Test using dd

MischaBaars commented 5 months ago

There is no Address Editor when opening the AXI4 Memory Mapped Default Example Design from a DMA/Bridge Subsystem for PCI Express added through the IP Catalog, because there is no Block Design associated with this example.

The section does state that the BRAM size can be changed, so I can either try changing the settings in the IP Customizations PCIe: BARs tab, or I can try the Vivado IP Integrator-Based Example Design which starts out by opening a Block Design.

mwrnd commented 5 months ago

The example designs are not easily altered. They exist to prove functionality.

I strongly recommend you move on to an IP Integrator Block Diagram design. You can follow the XDMA Example Design notes or my tutorial. The sooner you are communicating between your host system and an AXI Block the sooner you can pursue your project. Notice all the test scripts were last edited over 2 years ago.

If you are working with a custom board, it can be useful to delay your motherboard's BIOS Boot to allow for FPGA Configuration. It is difficult to meet the 100ms PCIe startup requirement. You can do this by pressing the POWER button, then pressing and holding the RESET button for a second before releasing it. Or, connect a 330uF-1000uF capacitor across the reset pins of an ATX motherboard's Front Panel Header:

Delay Boot Using Capacitor

MischaBaars commented 5 months ago

The example designs are not easily altered. They exist to prove functionality.

Well, this example design does not function very well with the scripts provided. Even with the BRAM set to its default size, the scripts are complaining about data integrity.

Notice all the test scripts were last edited over 2 years ago

I'll take the detour, I'm going to rewrite them.

If you are working with a custom board ...

Actually I was still struggling with the fan. While the board is brand new, the firmware is controlled by a 20 years old closed source software library.

MischaBaars commented 5 months ago

Matthew,

Do not edit the Block Memory Generator.

That's exactly where to edit the BRAM size! I was at exactly the right spot!

===>./io.sh xdma0, channel 0:0, block size 8192, address 0, offset 64, data file: /tmp/xdma0_h2c0_c2h0_unaligned/datafile-8-K, integrity 1, dmesg 1.    xdma0 channel 0:0: block size 8192, address 0, offset 64, data match.
===>./io.sh xdma0, channel 0:0, block size 8192, address 0, offset 128, data file: /tmp/xdma0_h2c0_c2h0_unaligned/datafile-8-K, integrity 1, dmesg 1.   xdma0 channel 0:0: block size 8192, address 0, offset 128, data match.
===>./io.sh xdma0, channel 0:0, block size 8192, address 0, offset 256, data file: /tmp/xdma0_h2c0_c2h0_unaligned/datafile-8-K, integrity 1, dmesg 1.   xdma0 channel 0:0: block size 8192, address 0, offset 256, data match.
===>./io.sh xdma0, channel 0:0, block size 8192, address 0, offset 512, data file: /tmp/xdma0_h2c0_c2h0_unaligned/datafile-8-K, integrity 1, dmesg 1.   xdma0 channel 0:0: block size 8192, address 0, offset 512, data match.
===>./io.sh xdma0, channel 0:0, block size 8192, address 0, offset 1024, data file: /tmp/xdma0_h2c0_c2h0_unaligned/datafile-8-K, integrity 1, dmesg 1.  xdma0 channel 0:0: block size 8192, address 0, offset 1024, data match.
===>./io.sh xdma0, channel 0:0, block size 8192, address 0, offset 2048, data file: /tmp/xdma0_h2c0_c2h0_unaligned/datafile-8-K, integrity 1, dmesg 1.  xdma0 channel 0:0: block size 8192, address 0, offset 2048, data match.
===>./io.sh xdma0, channel 0:0, block size 8192, address 0, offset 4096, data file: /tmp/xdma0_h2c0_c2h0_unaligned/datafile-8-K, integrity 1, dmesg 1.  xdma0 channel 0:0: block size 8192, address 0, offset 4096, data match.
===>./io.sh xdma0, channel 0:0, block size 8192, address 0, offset 8192, data file: /tmp/xdma0_h2c0_c2h0_unaligned/datafile-8-K, integrity 1, dmesg 1.  xdma0 channel 0:0: block size 8192, address 0, offset 8192, data match.

It was the scripts that were not functioning as they should. Let me work out the details. One moment please.

mwrnd commented 5 months ago

When you edit the Range (Memory Size) in the Address Editor and leave the Block Memory Generator on Auto that value (Address Width) will be propagated throughout the project. From the Block Memory Generator Product Guide (PG058) Pg#96:

Width and depth parameters in BMG are calculated and generated by the master (either by AXI BRAM Controller or LMB Controller to which the BMG IP is connected) based on the width selected in the master IP and the Address range set in the Address Editor.

What you are doing will work for your current goals but in a large project it may lead to some obscure bug.

Notice all the test scripts were last edited over 2 years ago I'll take the detour, I'm going to rewrite them.

It does not look like this project will accept any pull requests.

The test scripts use dma_to_device and dma_from_device. It will be easier to design your own test scripts and you can easily adapt them to larger memory transfers such as for the DDR4 you mentioned.

cd dma_ip_drivers/XDMA/linux-kernel/tools/
dd if=/dev/urandom of=TEST bs=8192 count=1
sudo ./dma_to_device   --verbose --device /dev/xdma0_h2c_0 --address 0x80000000 --size 8192  -f    TEST
sudo ./dma_from_device --verbose --device /dev/xdma0_c2h_0 --address 0x80000000 --size 8192 --file RECV
md5sum TEST RECV

Create a file full of random data:

dd if=/dev/urandom of=TEST bs=8192 count=1

All-zeros file:

dd if=/dev/zero of=TEST bs=8192 count=1

All-ones file:

tr '\0' '\377' </dev/zero | dd of=TEST bs=8192 count=1 iflag=fullblock
MischaBaars commented 5 months ago

When you edit the Range (Memory Size) in the Address Editor and leave the Block Memory Generator on Auto that value (Address Width) will be propagated throughout the project.

Thanks for the link, hadn't found this one yet. Figured it would happen like that.

What you are doing will work for your current goals but in a large project it may lead to some obscure bug.

But for the moment, I'm just trying to get the Default Example Design to work. It has no Address Editor, since it has no Block Design, and still, from the DMA/Bridge Subsystem for PCI Express Product Guide (PG195):

The example design from the IP catalog has only 4 KB block RAM; you can regenerate the subsystem for larger block RAM size, if wanted.

Et voilà, which the necessary modifications to the scripts:

===>./io.sh xdma0, channel 0:0, block size 262144, address 0, offset 0, data file: /tmp/xdma0_h2c0c2h0/datafile-256-K, integrity 1, dmesg 1. xdma0 channel 0:0: block size 262144, address 0, offset 0, data match.

Bigger BRAM sizes will follow. The 4KB example works as well, with the modifications.

Why do you think merge requests are being blocked and pull requests are being ignored? Isn't it of vital importance to AMD that at least the default examples are properly functioning? Maybe that's just it, only they aren't.

I'll try to get their attention when I'm done.

xdma_mm_256k.log

MischaBaars commented 5 months ago

Matthew,

I've tried a script like yours, and I noticed two things while testing different IP Customizations:

    PW  PD  18K 36K AW
0)  256  8192   0   64  13  -> (1«13)*(256»3)=256k  64*4k=256k 64*36k=???
1)  128 16384   0   64  14  -> (1«14)*(128»3)=256k
2)   64 32768   0   64  15  -> (1«15)*( 64»3)=256k
3)   32 65536   0   64  16  -> (1«16)*( 32»3)=256k

1) The 36K BRAMs behave as if they are 4k BRAMs. I can therefore only allocate 512kb (128 36k BRAMs of 300 36k BRAMS), before I run out. What happened to the other 8 4k cells in each 36k BRAM? 2) The Port A Width (PW) of 256 bits, is the only one that passes the test. The other three fail at 32, 16 and 8 bytes respectively. It might have something to do with xdma_engine::addr_align and xdma_engine::addr_bits. Looks like xdma_engine::addr_bits is initially set to 64, but eventually the whole variable seems to end up nowhere. IOCTL_XDMA_BITS_GET is not implemented as well.

mytest_0.log mytest_1.log mytest_2.log mytest_3.log

mwrnd commented 5 months ago

The XDMA IP Block uses Block RAM for internal buffers, variables, etc. That reduces the available pool of BRAM Blocks.

XDMA Demo Resources Used

When you generated your XDMA Example you likely chose PCIe x8 8.0GT/s which is the maximum the XCAU25P-FFVB676 you mentioned supports as it has 3 GTY Quads adjacent to the PCIE4 Block. That is roughly 8Gbytes/s of PCIe bandwidth. The XDMA Block attempts to match the AXI Bandwidth by setting the AXI Width to 256-bit and the AXI Clock Frequency to 250MHz, (256×250000000)÷8÷1024÷1024~=7.629Gbytes/s.

XDMA Block Properties

Refer to the Block Memory Generator Product Guide Pg#90, Block RAM Usage. 256/72~=3.5556~=4. Some of the BRAM is lost due to the width mismatch.

128 36k BRAMs of 300 36k BRAMS

Please post a screenshot of your Design Run and Project Summary - Utilization. The Artix+ Datasheet mentions 300 BRAM Blocks and you should be able to count on using at least half.

Resources Used

MischaBaars commented 5 months ago

About my first item:

Thanks for your elaborate answer, but:

Refer to the Block Memory Generator Product Guide Pg#90, Block RAM Usage. 256/72~=3.5556~=4. Some of the BRAM is lost due to the width mismatch.

When using the minimum area algorithm, it is not as easy to determine the exact block RAM count. This is because the actual algorithms perform complex manipulations to produce optimal solutions. The optimistic estimate of the number of 18K block RAMs is total memory bits divided by 18k (the total number of bits per primitive) rounded up. Given that this algorithm packs block RAMs very efficiently, this estimate is often very accurate for most memories.

These BRAM blocks are 36 kbit, instead of 36 kbyte :)

About my second item:

Using a Port A Width (PW) of 128, 64, or 32 bits, can you reproduce the malfunction?

    PW  PD  18K 36K AW
1)  128 16384   0   64  14  -> (1«14)*(128»3)=256k
2)   64 32768   0   64  15  -> (1«15)*( 64»3)=256k
3)   32 65536   0   64  16  -> (1«16)*( 32»3)=256k
mwrnd commented 5 months ago

In the XDMA Example project, xdma_app connects XDMA M_AXI to the Block Memory S_AXI:

xdma_0_ex xcau25p

Vivado will not let you directly connect two busses with different data widths as that implies data loss. You need an AXI Interconnect or AXI SmartConnect block between them.

Here is a Block Diagram recreation of my understanding of the XDMA Example project:

xcau25p Simple XDMA Block Diagram

The AXI BRAM Controller is set up with the same data width as the XDMA M_AXI port:

xdma_0_ex xcau25p AXI BRAM Controller

The BRAM is mapped to address 0x0 and I set the Range (Size) to 512kbyte:

xcau25p Simple XDMA AXI Addresses 512K

When I run Synthesis+Implementation Vivado uses 50 BRAM Blocks for XDMA and 128 for the Block Memory Generator.

xcau25p Simple XDMA Design Run 512K

The Address Range must be an exponential of 2 so the next larger option is 1M:

Address Editor Range Select

When I set the Range to 1M:

xcau25p Simple XDMA AXI Addresses 1M

Over 300 BRAM Blocks are required:

xcau25p Simple XDMA Design Run 1M

Which causes Implementation to fail:

xcau25p Simple XDMA Design Run 1M Fail Message

By adding an AXI SmartConnect block I can connect two AXI BRAM Controllers:

xcau25p Simple XDMA with SmartConnect Block Diagram

Mapping their address ranges consecutively allows for 768Kytes of memory which is accessible as a single block of memory by other AXI Blocks.

xcau25p Simple XDMA AXI Addresses 768K Consecutive

The design now uses 242 BRAM Blocks:

xcau25p Simple XDMA Design Run 768K

If I then add 3 more BRAM Controllers with consecutive addresses I can get 0xF8000=1015808 bytes of total addressable memory:

xcau25p Simple XDMA AXI Addresses 0xF8000 byte Consecutive

For a total BRAM Block usage of 298. 248 of those Blocks are used by the memory array, (248×36000)÷8=1116000. 1015808/1116000~=0.9102. Vivado manages >91% utilization efficiency for the BRAM Blocks.

xcau25p Simple XDMA Design Run 0xF8000

When I Implement the design for a board I have, I am able to write and read the complete BRAM Address space:

printf "%d\n" 0xF8000
cd dma_ip_drivers/XDMA/linux-kernel/tools/
dd if=/dev/urandom of=TEST bs=1024 count=992
sudo ./dma_to_device   -v  --device /dev/xdma0_h2c_0 --address 0x0 --size 1015808 -f TEST
sudo ./dma_from_device -v  --device /dev/xdma0_c2h_0 --address 0x0 --size 1015808 -f RECV
md5sum TEST RECV

dma_tools

PCIe has 128 to 4096 byte payload sizes. The default payload size that shows up (sudo lspci -nnvvvd 10ee: | grep MaxPayload) for this project is 256-bytes so at the very least reads and writes aligned to 256-bytes will work although this should be explored further.

MischaBaars commented 5 months ago

Hi Matthew :)

Once again, thank you for your elaborate response. We learn as we go.

Vivado will not let you directly connect two busses with different data widths as that implies data loss. You need an AXI Interconnect or AXI SmartConnect block between them.

This brought me onto something. There is another Data Width, the AXI Data Width, when re-customizing the DMA/Bridge Subsystem for PCI Express. I set the Maximum Link Speed to 2.5GT/s, thereby reducing the AXI Data Width to 64 bits.

Et voilà, then it is 2) that starts working.

    PW  PD  18K 36K AW
0)  256  8192   0   64  13  -> (1«13)*(256»3)=256k
1)  128 16384   0   64  14  -> (1«14)*(128»3)=256k
2)   64 32768   0   64  15  -> (1«15)*( 64»3)=256k
3)   32 65536   0   64  16  -> (1«16)*( 32»3)=256k

Couldn't find any 32 bits AXI Data Width for any of the Lane Widths and Maximum Link Speeds though.

I'll be continuing to Section 4.2: Tandem Configuration now.

Thank you for your support! I'll know to find you when I'm at a loss somewhere en route.

mwrnd commented 5 months ago

From the XDMA Product Guide (PG195): Support for 64, 128, 256, 512-bit datapath.

I took the 5 BRAM Controller project:

XDMA Test with 5 BRAM

Changed all the BRAM Controllers to use a 32-bit AXI Data Width (AXI SmartConnect performs clock domain crossing and data width translation):

BRAM_Controller_32-Bit_AXI_Data_Width

Implementation results in the same 298 total BRAM Blocks used:

XDMA_Test_with_5_BRAM_Design_Run_Results

MischaBaars commented 5 months ago

I'll be continuing to Section 4.2: Tandem Configuration now.

PG195 DMA/Bridge Subsystem for PCI Express (4.1), Tandem Configuration or Dynamic Function eXchange breaks Open IP Example Design.

alonbl commented 4 months ago

Hi @jberaud,

It would be great if you publish a userspace bridge to the xdma upstream driver, I might be able to help where I can to allow people drop this buggy driver.

What I am thinking is something similar to:

int fd = open(); // open device
void *region = mmap(fd, ..., size); // pin memory
blob blob = {
   .addr = p, // must be within region
   .size = s,
};
write(fd, &blob, sizeof(blob)); // send blobs to fill for receive or using sendmsg 
poll(fd); // support async - important
read(fd, &blobk, sizeof(blob)); // receive ready blobs

When outgoing the blobs are filled with data and notification is when send completed, when incoming the blobs are sent as candidates filled before notification.

Thanks,

mabl commented 3 months ago

As @alonbl says, guidance on how to upgrade this driver to use the latest upstream DMA engine implementation, while maintaining a simple file descriptor interface, would be greatly appreciated. To be honest, I am surprised that there is no example code out there, yet? Or am I just missing something.

sofiachebotareva commented 3 months ago

WinDriver, driver development toolkit, offers both XDMA and QDMA code samples. You can download WinDriver from its official website and enjoy a 30-day free evaluation. If you have any questions, feel free to contact me at sofiac@jungo.com.