OLIMEX / ESP32-POE

ESP32 IoT development board with 100Mb Ethernet and 802.3 Power Over Ethernet (POE)
Apache License 2.0
292 stars 110 forks source link

LAN8710A Ethernet Transceiver getting into weird state #50

Closed JwAtTEC closed 2 months ago

JwAtTEC commented 2 months ago

We have several of the ESP32-POE-ISO board in the field operated 24/7 on a POE network. We have had a few instances where the device has stopped sending data to the server. Without physical access to the device it is difficult to diagnose, but if the POE on the port is cycled, the device starts up properly again.

We recently got one of the boards back and have had it running internally. When we reset the server, one of the boards showed a weird state where the Link (LNK1) and activity (ACT1) LEDs were both steadily blinking at about a 1 second on/off cycle. This seems like an error condition of the transceiver. Probing the IC we could see that the ESP32 was still sending data to the transceiver but it was not getting transmitted to the LAN. When we reset the transceiver by bringing NRST to ground temporarily, the transceiver started transmitting again and returned to normal operation.

Upon examining the circuitry at pin 19 NRST, there is a 10K resistor to _3.3VLAN and a 10uF Cap that is not installed (physically and per the schematic). Section 3.8.5.1 of the datasheet calls out "A hardware reset (nRST assertion) is required following power-up" which, I assume, is why the 10uF cap was part of the design. However, we aren't seeing issues on power up, so that's likely not an issue.

My questions are: 1. Can anyone think of why the transceiver might get into this odd state,, and 2. Is there a software method from the ESP32 to correct? Right now we are contemplating using GPIO32 to tie to the NRST line to instigate a reset when we are failing to get acknowledgments on our transmits. Appreciate any feedback!

Kv603 commented 2 months ago

Possibly unrelated, but I've been using this bit of code in my ESP32-POE builds to recover from Ethernet failure:

pinMode(ETH_PHY_POWER, OUTPUT);
digitalWrite(ETH_PHY_POWER, LOW);
delay(499); 
digitalWrite(ETH_PHY_POWER, HIGH);
delay(999);
if(! ETH.begin()) { 
    digitalWrite(ETH_PHY_POWER, LOW);
    ESP.restart();
}
DanKoloff commented 2 months ago

What Kv603 wrote - there is already PHY_PWR pin which is GPIO12 and it can be used to toggle just PHY power. This is a workaround of course. If you are looking for reasons why Ethernet might hang, it is often this (especially if you are using custom software and libraries):

"A Hardware reset is asserted by driving the nRST input pin low. When driven, nRST should be held low for the minimum time detailed in Section 5.5.3, "Power-On nRST & Configuration Strap Timing," on page 59 to ensure a proper transceiver reset. During a Hardware reset, an external clock must be supplied to the XTAL1/CLKIN signal."

If you've held the main ESP32 chip powered down, you need to delay LAN8710A chip power up until the PHY clock is available, else it might hang. This is done by GPIO12 usually in the software, enabling the LAN few moments after ESP32 gets powered.

PHY clock is GPIO0 or GPIO17 depending on whether you use WROOM or WROVER version. There are two pins difference, so it matters if you use WROOM or WROVER variant. Refer to the schematic.

The reason why one out of many boards behaves differently is usually some hardware damage on LAN8710A chip. It can occur due to many reasons like grounds loops or from malfunctioning uninsulated POE switch/splitter, bare-hand handling, ESD, etc.

JwAtTEC commented 2 months ago

Thanks everyone for your helpful inputs. I had overlooked the PHY_PWR pin so that's the direction we will investigate. Closing this issue as resolved.