alexforencich / verilog-pcie

Verilog PCI express components
MIT License
1.02k stars 273 forks source link

FPGA versus FPGA_AXI Example Questions #31

Open softerhardware opened 1 year ago

softerhardware commented 1 year ago

Hi Alex,

Thanks for sharing your verilog-pcie library! I have been running simulations for the various components and examples as well as studying the source but still have a few questions. For all the Xilinx/AMD examples, there are fpga and fpga_axi versions. The fpga version uses the dma_psdpram segmented memory but the fpga_axi version does not. What are the pros and cons of these different implementations?

Although the fpga version with dma_psdpram has no client side interface, can a dma_if_axi.v interface be added to create a AXI4-MM master port for DMA traffic similar to the fpga_axi version? Are there any examples of this?

All the examples appear to use only 2 segments for the dma_psdpram. Is there any advantage to more segments?

Best Regards,

Steve Haynal

alexforencich commented 1 year ago

The pcie_us_axi_dma module is effectively a legacy module, predating the split DMA interface/DMA client architecture. I don't recommend using it, as it only supports Xilinx US/US+ and it has some significant performance limitations due to how AXI works. The recommended setup is to use dma_if_pcie + the corresponding device-specific shim (or dma_if_axi if you need an AXI interface to the host system), in combination with one or more DMA clients.

Eventually I plan on making DMA client modules that can act as both AXI master and AXI slave, but I have not had time to implement this yet. The goal is to be able to "mix and match" internal interfaces (AXI stream, AXI master, AXI slave, etc. of various widths), but so far I have not had a need for anything beyond AXI stream.

As far as segments are concerned, a minimum of two is required. There is currently no advantage to using more segments, but possibly this will change in the future. For example, if the DMA interface module is updated to support a segmented interface on the PCIe TLP side, then the segments on the memory-mapped side must be no wider than the TLP segment size. It also might be an advantage if multiple DMA IF modules are used (such as one PCIe DMA IF and one or more AXI DMA IF for on-card DRAM) as it could result in better utilization of the internal interface.

softerhardware commented 1 year ago

Hi Alex,

Thanks for the reply. I studied the two dma_client_axis_sink/source clients as well as the corundum a bit more. As far as I understand, dma_if_pcie+shim+dma_client_axis_sink/source+dma_psdpram is not sufficient. You also need a controller similar to tx_engine/rx_engine in corundum to glue and coordinate both sides of the DMA. Is that a correct understanding or am I missing something? Is there a small example of such a controller? I am evaluating your IPs as a potential replacement for the XDMA.

Best Regards,

Steve Haynal

myqlee commented 6 months ago

@softerhardware What should I do if an FPGA module A needs to receive and send data through the PCIe interface? Or rather, which modules should this FPGA module A be connected to?

alexforencich commented 6 months ago

Anything involving PCIe needs some careful thought about the plumbing. PCIe is fundamentally performing memory reads and writes, so nothing really sends and receives data directly. The only thing you can do is initiate or terminate memory reads and writes.

If you want to "send" data, there has to be a buffer somewhere with an address to receive it, or you have to expose the data such that it can be read from some location in the BAR space. "receive" is the converse - either read it out of a buffer, or terminate memory writes in BAR space.

The DMA engine is set up to to facilitate initiating memory operations via an internal scratchpad RAM. So you need to get the data into the scratchpad RAM, then get the DMA engine to transfer it to the host, or vise versa. The DMA client modules can help with this. So, you could use the AXI stream DMA client module to write streaming data into the scratchpad, then issue a request to the DMA interface module to perform the actual DMA operation. The example designs currently only have the DMA interface module, and I don't have any "canned" logic to coordinate transfers at a high level, as the specifics of this can be highly application dependent.

myqlee commented 6 months ago

@alexforencich Thank you very much for your response! My own module A indeed receives and sends data through the AXI-Stream interface. So, which modules should my module A be connected to? Can I simply replace the example_core module(as shown in the red box) in the PCIe project with my own module A? image

alexforencich commented 6 months ago

That's sort of the approach that Corundum uses - the core logic doesn't contain the DMA engine, then the DMA engine is included in a wrapper. That way, it can support PCIe and AXI.

But, you'll also need to include the AXI stream DMA client modules and write appropriate control logic. Take a look at how the modules are used in Corundum for a much more comprehensive example.

myqlee commented 6 months ago

Okay, I'll learn about Corundum. Thank you!

myqlee commented 6 months ago

@alexforencich In the projects of PCIe and Cordum, the PCIe IP cores you use are both PCIE4C.Why use PCIE4C instead of XDMA?

alexforencich commented 6 months ago

XDMA and QDMA are too inflexible and too proprietary and neither are really designed for networking, so corundum uses a fully custom DMA engine instead, which interfaces with a transaction-layer IP core.

xiongyw commented 5 months ago

For all the Xilinx/AMD examples, there are fpga and fpga_axi versions. The fpga version uses the dma_psdpram segmented memory but the fpga_axi version does not. What are the pros and cons of these different implementations?

It seems that fpga_axi version has a merit that it demostrates the use of MSI interrupts, while fpga examples use MSI-X only.