espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.46k stars 7.25k forks source link

Esp-Mqtt + pppos on data receive error (IDFGH-10482) #11729

Open Sjurinho opened 1 year ago

Sjurinho commented 1 year ago

Answers checklist.

IDF version.

v4.3.4-dirty

Operating System used.

Linux

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

ESP32-Wroom32

Power Supply used.

External 5V

What is the expected behavior?

Receive the mqtt data and fire MQTT_DATA event

What is the actual behavior?

It fires these logs and then disconnects. It may take several retries to send a command to the board before a successful receive. E (307900) TRANS_TCP: tcp_poll_read select error 104, errno = Connection reset by peer, fd = 60 E (307901) MQTT_CLIENT: Poll read error: 119, aborting connection

Steps to reproduce.

The error is hard to reproduce, since it does not always happen. However, I do not believe the issue is due to connection being unstable, since the board is able to send messages to the broker all the time. We are using a simcom sim7070g modem, connected using esp-modem to set up the connection. Through our SIM provider I see that messages are sent correctly and are even retried several times before giving up.

Debug Logs.

No response

More Information.

I recently updated from ESP-IDF v4.2.3. I also experienced the issue at that time, but it seems to have gotten worse since updating. Does anyone have any experience with this?

david-cermak commented 12 months ago

However, I do not believe the issue is due to connection being unstable

I agree. Do you see some disconnection event from the PPP server somewhere around the logs you shared?

E (307900) TRANS_TCP: tcp_poll_read select error 104, errno = Connection reset by peer, fd = 60

This error clearly indicates that the server actively initiated the disconnection by sending a RST to the client. You can easily simulate it by running a MQTT client locally, killing the server while keeping the client connected.

yet, there's nothing we can do about it, as clients. I'd suggest you check the logs from the server, to see if perhaps some problem appears in the log, that would explain why the server actively reset the connection?

Sjurinho commented 11 months ago

@david-cermak, Even if the error might lie with the server, is it not weird that it occurs more frequently after updating ESP-IDF version?

david-cermak commented 11 months ago

I agree that's weird, but we have to look at logs/messages and starts from there.

if the error might lie with the server,

This doesn't necessarily mean there's an issue with the server. The disconnection request comes from the server, and there's usually a reason for that. That's why I asked for checking the server side logs. I could think of several potential issues: 1) some weird bug in mqtt client -- I've seen this before, we had an issue with incorrect messages due to packet fragmentation which exhibited in a similar way. Server side logs did help us discover and fix the issue. This doesn't look like the mqtt client problem tho, as it's very widely used component and v4.3 is also exposed and seasoned version. 2) some glitch in PPP connection resulting in missing a MQTT keepalive message 3) problem of missing or corrupted UART signal causing incorrect transmission of an MQTT message

But again, to tell more we would need to understand the reason for disconnection, which is only available on the server side logs. I would say that problem 2) is pretty common scenario of using inherently unstable modem connection. Problem 3) would maybe indicate there might be a way to prevent it and would also explain that it's happening more or less frequently when building with different SDK version.

HarshanaCompax commented 9 months ago

Hi @david-cermak ,I was following up on the issue @Sjurinho mentioned here. I took the logs from the mqtt server and client. PACKET_SNIFF Here it is 1883 is the mqtt server host port and 43280 is the mqtt client port. It is clear that the request to FIn is starting from the esp32 client.

david-cermak commented 6 months ago

@HarshanaCompax Are you sure, you're talking about the same scenario as @Sjurinho ?

Could you please share the log from your board as well?

About the packet capture, could you please elaborate a bit on your environment? Did you setup some port/IP forwarders, so you can sniff your modem's PPPoS packets on your local network?