esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
15.91k stars 13.34k forks source link

The lwIP_enc28j60 driver does not handle Rx buffer overflow. #8123

Open pvrfr opened 3 years ago

pvrfr commented 3 years ago

Basic Infos

Platform

Problem Description

I hope that I have done this right!

There is an intermittent going deaf condition with the Ethernet interface using the ENC28J60 module. I have done a lot of work on the ENC28J60 driver as used in the OpenSprinkler project: https://github.com/OpenSprinkler. I believe that the problem relates to the lack of a recovery mechanism for a receive buffer overflow condition.

According the the spec in section 7.2.4, if there is insufficient space in the receive buffer the RXERIF bit will be set in the EIR register. Also the BUFER bit will be set in the ESTAT register. The chip stops receiving data in this Rx overflow state. This buffer overflow condition and these error bits will not be cleared without deliberate action by the system. A soft reset is required.

There is no code in the driver that checks for the Rx overflow condition. The BUFER bit is not defined in the header file or the src file. The RXERIF bit in the EIR register is also not defined. These bits must be checked on every call to receive data.

When in a buffer overflow state, no new packets will be received and the buffer will remain empty once the previous packet has been transferred to the system. Thus, the Ethernet interface appears to go deaf.

A buffer overflow can be a frequent occurrence in some conditions. The ENC28J60 supports up to 1000Base-T Ethernet but can only communicate with the system at SPI speeds up to 20 MHz. Since there is no on-chip processing of packets (offload engine), all received data must be sent to the system for handling. This can easily overload the SPI interface and/or the system.

The simplest recovery is to call the ENC28J60::softreset() routine. But a more elegant recovery mechanism is needed.

d-a-v commented 3 years ago

It sounds like you could be able to help ! I integrated and quickly tested this driver but I'm not a heavy user and I take no credit for the three Ethernet drivers that I took as-is. As you may have noticed, the interface between this core and an network driver is quite light: Here's the required public and protected interface in the header (The private interface is specific to each driver).

You could:

A way to reproduce the issue will be welcome so one could appreciate the benefits.

pvrfr commented 3 years ago

None of these drivers are mine. I got into debugging the code for my OpenSprinkler so that I could water my lawn without the unit hanging. The driver that is used is part of the UIPEthernet package. I have suggested changes there as well.

During my research I found this implementation and thought that I would point out the problem that I found.

If you want me to make the change I can do that but I have no system to test it on. If you are good with that I can add the code. It is quite simple.

JAndrassy commented 3 years ago

I plan to do a fix in UIPEthernet and EthernetENC library. I hope it will be simple so I thing I can then fix it here too.

pvrfr commented 3 years ago

Also consider checking the TXRTS bit before transmitting a packet. If the bit is set then the module is in the process of transmitting a packet. The module should be left to complete the process before attempting to transmit the next packet. Unfortunately, a change in this code will have an effect on the API since the ENC28J60::sendFrame function, as coded, will never return an error.

This is probably not a problem in most cases because the system is slower than the Ethernet channel. However if the Tx buffer is overwritten it would produce a particularly tricky bug to find.

JAndrassy commented 3 years ago

I plan to do a fix in UIPEthernet and EthernetENC library. I hope it will be simple so I thing I can then fix it here too.

I found out there is nothing to fix. The state bits are there for enc interrupt handling, but the libraries don't use enc interrupt.