Xilinx / embeddedsw

Xilinx Embedded Software (embeddedsw) Development
Other
936 stars 1.07k forks source link

Cache incoherency issue in lwip with PS EMAC DMA driver #53

Open mpv89 opened 6 years ago

mpv89 commented 6 years ago

Initially posted on Xilinx forums [1]: there is DMA cache incoherencies when using lwIP and the PS EMAC on a Zynq at high throughput. The problem can be reproduced by following the steps in described in the post [1].

We have a workaround in place where we force lwIP to re-do IP packets checksum and invalidate any that fail. This obviously slows down the CPU and the total throughput but at least we can once again believe that the data we are receiving is valid. A fix would be appreciated however.

[1] https://forums.xilinx.com/t5/Embedded-Processor-System-Design/FreeRTOS-lwip-driver-cache-issues-in-2018-2/td-p/879070

harini-katakam commented 6 years ago

Hi,

Could you please try the following? Xil_DCacheInvalidateRange((UINTPTR)p->payload, rx_bytes); at https://github.com/Xilinx/embeddedsw/blob/master/ThirdParty/sw_services/lwip202/src/contrib/ports/xilinx/netif/xemacpsif_dma.c#L498

This will do an cacheinvalidate on the exact packet before it is passed for processing.

Regards, Harini

mpv89 commented 6 years ago

Hi Harini,

Thank you for your suggestion and I am happy to let you know that it has solved the problem. I ran several iperf tests and lwIP checksums no longer show as invalid.

Thanks, Marcelo

skoehler commented 6 years ago

Is the fix included in 2018.2.2?

harini-katakam commented 6 years ago

No, this is not included in 2018.2. It should be in the next release (2018.3).

wzengerle commented 5 years ago

This fix also solved a similar issue I was having. I just pulled 2108.3 and looked a the code from a fresh bsp create. Sadness... the patch does not appear to have made it into the 2018.3 release.

harini-katakam commented 5 years ago

Hi, this was not included in 2018.3 but will be included in the next release. Meanwhile we will provide an answer record with this patch.

skoehler commented 5 years ago

Is there a reason the fixes are not incorporated into new versions? God knows how many people are building their apps based on this code. Also, the error is not easily detected. We had some checksum errors here and there, but we only noticed because our TCP-based protocol had checksums of its own. That is not typically the case. Did somebody take care and list it as a known issue in the release notes at least?

I believe the issue has been around for a long time. I compared xemacpsif_dma.c between lwip141 and lwip202 based versions, and the code is mostly identical.

harini-katakam commented 5 years ago

We're validating this patch at our end for multiple devices as lwip is common to all of them. We'll try to incorporate this into the tree and create an answer record (linked to known issues) at the earliest.

skoehler commented 3 years ago

This bug has been around for more than 2 years now. Is it fixed?

HariniKatakamX commented 3 years ago

It was fixed in 2019. Please see: https://github.com/Xilinx/embeddedsw/commit/7bad37c416c368a1fe81a559fde48843a476fe76 There was an lwip library upgrade; so you'll have to use the lwip211 version: https://github.com/Xilinx/embeddedsw/blame/master/ThirdParty/sw_services/lwip211/src/contrib/ports/xilinx/netif/xemacpsif_dma.c#L507

skoehler commented 3 years ago

Are you saying that you did not fix it for past versions, such as lwip202_v1_2 ? Why? It's a pretty serious bug. Is that a general policy that old versions don't get fixed?

I'm asking because I don't see how the XSDK would allow me to include custom patches in the BSP sources. I believe this is still true for Vitis.

HariniKatakamX commented 3 years ago

Yes, the older version remain the same - this is a general policy on the whole embeddedsw repo. All fixes are added to the next public release on the new version that is incremented. Vitis and SDK both allow custom updates. You can edit the BSP inline or create a "repo" with your local patches on embeddedsw.