In PHYTXDatapath and PHYRXDatapath the Async FIFO used to cross from sys to pcie domain is only 4 deep.
In my case with 200 MHz sys and 250 MHz pcie clock, the TX FIFO would become full pushing continuously on the sys side before the pcie side had a chance to read anything (despite being in a faster clock domain). This causes some back pressure in the sys domain even though the pcie domain is faster and is always ready to accept data.
I increase the FIFO depth to 16 in both PHYTXDatapath and PHYRXDatapath and that boosted my DMA speed.
I went from 37.45 Gbit/s to 45.5 Gbit/s (on a gen3 x8 link) from just that change, which is pretty significant.
In PHYTXDatapath and PHYRXDatapath the Async FIFO used to cross from
sys
topcie
domain is only 4 deep.In my case with 200 MHz sys and 250 MHz pcie clock, the TX FIFO would become full pushing continuously on the
sys
side before thepcie
side had a chance to read anything (despite being in a faster clock domain). This causes some back pressure in thesys
domain even though thepcie
domain is faster and is always ready to accept data.I increase the FIFO depth to 16 in both PHYTXDatapath and PHYRXDatapath and that boosted my DMA speed. I went from 37.45 Gbit/s to 45.5 Gbit/s (on a gen3 x8 link) from just that change, which is pretty significant.