CrossClockDomain FIFO depth too shallow causes inefficiencies

In PHYTXDatapath and PHYRXDatapath the Async FIFO used to cross from sys to pcie domain is only 4 deep.

In my case with 200 MHz sys and 250 MHz pcie clock, the TX FIFO would become full pushing continuously on the sys side before the pcie side had a chance to read anything (despite being in a faster clock domain). This causes some back pressure in the sys domain even though the pcie domain is faster and is always ready to accept data.

I increase the FIFO depth to 16 in both PHYTXDatapath and PHYRXDatapath and that boosted my DMA speed. I went from 37.45 Gbit/s to 45.5 Gbit/s (on a gen3 x8 link) from just that change, which is pretty significant.

enjoy-digital / litepcie

CrossClockDomain FIFO depth too shallow causes inefficiencies #104