OLIMEX / ESP32-POE-ISO

ESP32 Power Over Ethernet board with 3000VDC Galvanic insulation
105 stars 40 forks source link

Two devices bricked after connecting Ethernet (with autosensing POE) while USB connected #3

Closed lucaelin closed 2 years ago

lucaelin commented 3 years ago

I am working on a project involving the ESP32-POE-ISO Rev.D I have a Waveshare epaper module connected via UEXT SPI, Ethernet with POE and USB on the device. During testing both my boards failed within four weeks after purchase. Both of them are now failing to connect to ethernet. I have tested multiple different networks and switches now, none of which establish a proper connection with any of the two boards.

The project started out with one board, about a month ago. It worked like charm and about one week in I ordered a second one. The firmware is using the ESP-IDF and is derived from the Ethernet-Example code. Additionally it enables promiscuous mode on the chip and registers a custom ethernet-frame handler, so I can log any incoming packet on the chip (assuming there is existing broadcast traffic coming from the switch network as my project does not send any frames itself). The switch I usually connect to is in my office and a Ubiquity Unifi 8-Port POE auto-sensing swtich. It worked fine for about two weeks.

Soon I was testing some software reset behavior and repeatedly connecting and disconnecting the RJ45. Not too fast though, I always had to wait a couple of seconds for the epaper display to properly refresh. At some point the software stopped calling my frame-handler and the link LED on the board was constantly lit, even after disconnecting the RJ45. After a powercycle the device only sometimes establishes a link and if it does, the link indicator remains on even if the rj45 is disconnected. Most of the time, it does not establish a link. Additionally the device does not receive any ethernet frame in the handler. I also tested the Arduino example code and the idf 4.2 example and both of them didn't manage to establish a reliably working link. They reliable print the Ethernet started message, but only rarely show the Ethernet Link Up message. Even if they show Ethernet Link Up, they don't seem to get any IP. The DHCP-Server does not seem to be an issue as the second board worked fine at that point. SO I put the broken board aside and continued with the second one.

The second one was also working reliably for several week. I even handed it out to my colleague for two week to test it on multiple switches. This weekend I got it back to add some features he was asking for. Yesterday evening I tested it all, when suddenly after disconnecting and reconnecting the switch, this devices stopped working as well. Just like the other one, it stopped receiving any frames, as I could see from the serial monitor. Contrary to the first board, this one does manage to reliably establish a link, but no frames are received and disconnecting the RJ45 keeps the link up. I again flashed the example code and again no IP is detected. As this board now reliably configures a link, I hooked it up to my laptop with wireshark and a ISC-DHCP-Server running. To my surprise I could see a DHCP discover packed being sent by the board! The laptop does respond with an DHCP offer, then silence. No further data is coming from the board. This seems similar to my own project: The internal frame handler is not called, hence the DHCP offer is never seen by the software.

I am clueless on how to debug this. Any help?

DanKoloff commented 3 years ago

Based on the good initial behavior and your colleague tests with multiple switches I would assume the issue seems related to either the "Ubiquity Unifi 8-Port POE auto-sensing switch" and/or the testing procedure of disconnecting and reconnecting the Ethernet cable.

You powered both ESP32-POE-ISO boards only from the "Ubiquity Unifi 8-Port POE auto-sensing switch" when the issue appeared? You kept connecting and disconnecting the Ethernet cable (and only power supply) between ESP32-POE-ISO the switch during testing?

I am not familiar with the switch but I see it supports passive PoE at 24V which is not sufficient for ESP32-POE-ISO, make sure the switch is set to active PoE.

Maybe try to disable powering from the switch and power the board from USB.

lucaelin commented 3 years ago

Hey Dan, thank you for taking a shot at it. During the failure I had USB connected to the devices. My switch does support passive POE but the ports are configured for 802.3af/at. I previously ran several tests without USB being connected, and both boards were perfectly capable to power up using the switches POE. Now that the devices are bricked I cannot get any connection with them. Neither powering over USB with a non-poe ethernet (on a switch that is known to work), nor the POE port worked fine before. And any other devices connected to the switch now also work. So I am fairly confident that this is not a Switch issue.

DanKoloff commented 3 years ago

Seems related to this action then, whether and how exactly and what got damaged is hard to say:

Unit 1:

Soon I was testing some software reset behavior and repeatedly connecting and disconnecting the RJ45.

Unit 2:

Yesterday evening I tested it all, when suddenly after disconnecting and reconnecting the switch, this devices stopped working as well.

Connecting and disconnecting the RJ45 repeatedly maybe somehow damaged the units. Not sure how acceptable it is for 50V+ PoE setup to disconnect and connect repeatedly. Edit: It would probably make a difference if the switch loses power supply (which should be alright) or if the RJ45 cable between the switch and the ESP32-POE-ISO is removed (which doesn't sound OK to me).

lucaelin commented 3 years ago

Just to clarify: It doesn't sound OK to you to have a device fail when removing the RJ45 or it doesn't sound OK to remove the RJ45? I am not sure I could agree with the latter... And powering down the switch prior to disconnecting is not something I have ever heard any manufacturer recommend and certainly not something I can recommend to my colleague in the field. If you want to know why disconnecting the device is so crucial to my project you can find a description of the Idea in a repository of mine called lldp-esp. Basically I want the board to render LLDP information about the switch it is connected to on an epaper display. LLDP messages are sent every 30-60 seconds so disconnecting doesn't happen immediately after connecting the device, but any flaw that kills the board within weeks or even months would be fatal to the entire project... :( Do you have any lead on what could be wrong with the boards or how I could narrow down the search?

DanKoloff commented 3 years ago

...it doesn't sound OK to remove the RJ45?

This. I am pretty sure that even if both the switch and the board handle these disconnects without issues, at least the RJ45 plug will have some issues from the sparking.

But I will test it myself here and check if I can burn ESP32-PoE-ISO in similar conditions, will update you if it happens here and if I notice something in the way I managed to burn it.

And powering down the switch prior to disconnecting is not something I have ever heard any manufacturer recommend and certainly not something I can recommend to my colleague in the field.

My idea was not to unpower voluntarily the switch. But a scenario where the switch loses involuntarily power supply. And looking at your project - it states exactly that "Its purpose is to passively listen on ethernet traffic and extract information about the switch it is connected to". As I understand that description we are interested in monitoring the switch, not the ESP32-POE-ISO and the unit that gets disconnected is the switch not the ESP32-POE-ISO. Maybe I don't understand something.

I will keep you updated about my tests.

lucaelin commented 3 years ago

But I will test it myself here and check if I can burn ESP32-PoE-ISO in similar conditions, will update you if it happens here and if I notice something in the way I managed to burn it. I will keep you updated about my tests.

Awesome! I have two new boards arriving today, so we'll see how they are holding up.

and the unit that gets disconnected is the switch

Not exactly. A practical example would be installing PoE driven Phones in an Office where you don't know how the cabling in the walls is running. You would connect the board to the socket in the wall and it shows you the switch and port it is connected to. Knowing that, you can make adjustments to the ports configuration or try another socket in the wall until you find one that suits the needs for installing the phone.

at least the RJ45 plug will have some issues from the sparking.

Yes, I have read about issues with that. AFAIK there are cables and sockets built with this in mind and the problem seems to get worse the more power the device actually requires. It should be fairly simple to replace the socket on the board with a different one that is better suited for this task, if it develops into a problem, right?

DanKoloff commented 3 years ago

Now I understand the purpose, connecting and disconnecting the Ethernet makes sense in this scenario, indeed.

Meanwhile, I can't brick it. I found some simple Ubiquity GP-H480-050G, programmed my ESP32-POE-ISOs revision D with the ETH_LAN8720 example via Arduino IDE, keep the USB connected, also kept serial monitor software open to check the Ethernet status. Then kept connecting and disconnecting the Ethernet jack in 10 second intervals. This should be exactly what you did when fault occurred, correct? Repeated 50 times but no fault here.

Here are two pictures of my hardware setup:

https://imgur.com/a/bOYOFjt

The blue cable is CAT6.

Also tried other repeatable tests - shutting down the whole Ubiquity GP-H480-050G and powering it again; removing and connecting again the USB from the ESP32-PoE-ISO while PoE is active; removing the USB and connecting and disconnecting the PoE. Still no fault.

Until I can replicate it here to burn I will be inclined to believe it is something at your side. Probably something in your setup or method of testing.

Make sure to perform your tests with same code to be even closer setups (minus the exact same PoE switch),

Maybe it can be attributed to faulty cable or faulty switch or port or settings. For sure test with another cable.

How was the board placed while you connect it and disconnect it? It is open PCB design, handling it with bare hands might influence it, either by static electricity or if you touched multiple components directly. There are 50V remember. Placing it on metal or conductive surfaces is dangerous too. Maybe consider a box if it has to be handled by non-professionals with bare hands.

Hard to say what might be damaged (if anything is damaged in the boards at all) but probably the connector or the Ethernet chip (LAN8710A).

DanKoloff commented 3 years ago

Can it be something related to the SPI and the Waveshare module? Was it connected when fault occurred?

lucaelin commented 3 years ago

Thank you so much for testing this! It is really great to know that you care so much!

I found some simple Ubiquity GP-H480-050G

This is 48V passive PoE, right? I was using 802.3af at the time of failure and I only have af devices to test with...

Until I can replicate it here to burn I will be inclined to believe it is something at your side. Probably something in your setup or method of testing.

I agree that this seems more likely now, yes. I have now switched to a Ubiquity PoE af injector connected to the same switch as before but on a non PoE port. Replacing the cable seems like a good idea, too. I will do so. One of the two new boards is with my colleague and in the field again. He is aware of the issue and will report if anything out of the ordinary happens. I will continue to test the one I have left.

Can it be something related to the SPI and the Waveshare module? Was it connected when fault occurred?

Could be, both board were connected to a Waveshare modules at that time. Connected via the UEXT pins for power and SPI. Updating the displays via SPI remains working on both boards after the ethernet has failed though. The first was a 1.54inch 200x200px Black/White/Red rev. 2.1, the second a 2.66inch 296x152px Black/White. A second 2.66inch is currently in the mail so I have the same display as the one my colleague now uses.

How was the board placed while you connect it and disconnect it?

The first one has a custom made wooden case that was open at the top. Primarily to have a better grip on the board while plugging into the jack. I don't think any conductive parts were touched during the failure. The second had a custom ABS case and was fully enclosed during the failure.

Hard to say what might be damaged (if anything is damaged in the boards at all)

I would be more than happy to send you the boards if you want to take a look at them? Otherwise I might be able to find someone with an oscilloscope if I ask around a bit, but that might take a while. Would that help?

Thanks again for your support!

DanKoloff commented 3 years ago

This is 48V passive PoE, right?

I guess but a passive PoE should be worse-case scenario since there is no real back and forth communication going on and no "staging" in the powering sequence (as in the 802.3af).

I would be more than happy to send you the boards if you want to take a look at them? Otherwise I might be able to find someone with an oscilloscope if I ask around a bit, but that might take a while. Would that help?

For such hardware faults, probably a simple multimeter tool would be enough to find the exact damaged component. Make a setup without the PoE power (power from the USB, regular LAN connection). Then measure if the Ethernet chip LAN8710A has any voltage on it. Refer to the schematic or the schematic export when measuring. Check around the Ethernet chip.

portfast commented 3 years ago

Hi,

I have this same problem with an ESP32-POE-ISO, in this case a device bought about 3 months ago has gradually gone bad. It periodically dropped off ethernet and would come back after being detached and reattached.

I am powering it from a Cisco 3650 PoE switch and I had assumed this was a bad cable but after replacing the cable this weekend, the failure state persisted, and continued even when I brought the device to the switch and connected it on a 1 metre patch cable. The switch was logging errored frames also.

Finally the device is failing to link with the switch at all (although successfully drawing PoE power) so I've brought it to the oscilloscope to try and diagnose the problem, which seems to be that it is establishing ethernet link with itself - I can see the 100mbps signal on the port even with no ethernet cable attached.

My board is Rev D and it went from working to not working without being touched by a human, its job is to sit in a waterproof box reading 3 DS18B20 sensors in my garden connected using the onboard 3.3v power and reporting the results to MQTT.

DanKoloff commented 3 years ago

Finally the device is failing to link with the switch at all (although successfully drawing PoE power)...

How did you determine it is successfully powered via PoE? Do you suspect that only the LAN part of the board is not working?

...so I've brought it to the oscilloscope to try and diagnose the problem, which seems to be that it is establishing ethernet link with itself - I can see the 100mbps signal on the port even with no ethernet cable attached.

How do you power the board when no Ethernet cable is attached?

lucaelin commented 3 years ago

I can see the 100mbps signal on the port even with no ethernet cable attached.

I can confirm this behavior as well.

How do you power the board when no Ethernet cable is attached?

Boot up the broken board using PoE, connect it to USB once it claims to have link, then remove the ethernet. In my case the Link according to the software stays up and Link LED remains on.

So, yes PoE seems to be working fine on all of my boards.

The switch was logging errored frames also.

I checked on my switch and I don't see any errored frames. But I would not trust Ubiquity devices on these stats, tbh.

portfast commented 3 years ago

When powered by PoE, I see +3.3 and +5v in the expected places, and also can see console output from my code on the appropriate GPIO, so it's definitely running.

I powered it via USB when testing for ethernet signals on the port. This board seems to usually but not always link with itself on boot, so no need to attach it to a switch to get the link lights to come on.

Also I've just flashed some different code onto it which uses the ESP32's wifi instead of the LAN port, this all works fine so I would say the only fault with the board is the leakage of signal from the LAN TX to RX.

DanKoloff commented 3 years ago

The only thing I can conclude from your descriptions is that either LAN8720 (more likely) or the Ethernet connector (less likely) got damaged. Unfortunately, neither me nor the team have any idea how and why such damage occurred. Last couple of weeks our designers re-evaluated the Ethernet part of ESP32-POE-ISO and there doesn't seem to be any error in the design and it is PoE design we have used previously in other boards for years.

My advice, if you wish to help us lower the number of possibilities is to return the damaged boards so we can at least confirm what exactly got damaged. First contact me via the support e-mail support@olimex.com and provide a link to this thread (so we don't go over it again) and we will issue RMA number and give you further instructions about the return. Thanks!

ddv2005 commented 3 years ago

Hello,

I have the same issue. Bought it via mouser.com week ago and it worked fine for a week while it was powered via USB. Then I bought POE switch (STEAMEMO 5 Port Gigabit Ethernet Unmanaged PoE Switch https://www.amazon.com/gp/product/B08TFY4G1C/ref=ppx_yo_dt_b_asin_title_o09_s00?ie=UTF8&psc=1 ) but once I connect board to this switch ethernet stop working. Board got little bit hot (about 45-50C) and TX &RX leds are off . If I connect USB and reconnect RJ45 then TX &RX leds going on and activity led start blinking ...also if I disconnect RJ45 again then TX &RX leds still on. I tried to use it without POE again but looks like ethernet is dead. And I feel that ethernet chip get warm even without POE power. Another board (wESP32) works fine on the same switch. Once I get replacement from mouser.com I can send it to support to find out what is wrong.

lapers commented 2 years ago

Hello,

I have the same issue. Bought it via mouser.com week ago and it worked fine for a week while it was powered via USB. Then I bought POE switch (STEAMEMO 5 Port Gigabit Ethernet Unmanaged PoE Switch https://www.amazon.com/gp/product/B08TFY4G1C/ref=ppx_yo_dt_b_asin_title_o09_s00?ie=UTF8&psc=1 ) but once I connect board to this switch ethernet stop working. Board got little bit hot (about 45-50C) and TX &RX leds are off . If I connect USB and reconnect RJ45 then TX &RX leds going on and activity led start blinking ...also if I disconnect RJ45 again then TX &RX leds still on. I tried to use it without POE again but looks like ethernet is dead. And I feel that ethernet chip get warm even without POE power. Another board (wESP32) works fine on the same switch. Once I get replacement from mouser.com I can send it to support to find out what is wrong.

I am using ESP32-PoE instead of ESP32-PoE-ISO. Currently I have the same problem with 2/3 of my new boards, the last one board not tested. All works well on USB power. The device heats up in the DC / DC converter area, and this area remains cold when powered by USB. Ethernet PHY is LAN8710A.

To see, what happens I just connected TX, RX & GND pins of UART->USB converter to ESP32-PoE In terminal I saw: ... [WiFi-event] event: 20 (Ethernet plugged in) [WiFi-event] event: 21 (Ethernet unplugged) [WiFi-event] event: 20 (Ethernet plugged in) [WiFi-event] event: 21 (Ethernet unplugged) ...

Behaviour looks like some power problems

The only thing I can conclude from your descriptions is that either LAN8720 (more likely) or the Ethernet connector (less likely) got damaged. Unfortunately, neither me nor the team have any idea how and why such damage occurred. Last couple of weeks our designers re-evaluated the Ethernet part of ESP32-POE-ISO and there doesn't seem to be any error in the design and it is PoE design we have used previously in other boards for years.

My advice, if you wish to help us lower the number of possibilities is to return the damaged boards so we can at least confirm what exactly got damaged. First contact me via the support e-mail support@olimex.com and provide a link to this thread (so we don't go over it again) and we will issue RMA number and give you further instructions about the return. Thanks!

Have you found out what's going on?

f34rdotcom commented 2 years ago

I have at least 5 bricked so far and just had a customer unit fail today same problem. Seems to happen when connecting ethernet or usb. Each time the current on the board goes up and USB communications fail with garbage or no usb serial data at all. Any suggestions on narrowing it down? My device also provides power to the ESP32 poe iso board so this just adds to the complexity but from my testing this seems to only happen when connecting ethernet with or without POE.

DanKoloff commented 2 years ago

@lapers @f34rdotcom Contact us at support@olimex.com with details about the failed boards, what happened exactly, what hardware revision of the boards is, pictures of the failed boards and any other significant information that would allow us to try to replicate the issue. Thank you!