espressif / esp-lwip

Fork of lwIP (https://savannah.nongnu.org/projects/lwip/) with ESP-IDF specific patches
Other
79 stars 126 forks source link

pppos: fix in_tail null (IDFGH-4258) #24

Closed peter-pycom closed 2 years ago

peter-pycom commented 3 years ago

happens during quick reconnect of LTE ppp session

peter-pycom commented 3 years ago

I have experienced this null error while working with Pycom firmware https://github.com/pycom/pycom-micropython-sigfox. This includes idf 3.3.1, which in turn includes an older version of this lwip repository.

It happend during an lte reconnect. So, I'm not sure how to reproduce this problem in vanilla idf. Also, I'm not sure whether my handling (link.drop) is perfectly in line with your other code. Happy to update if needed

Alvin1Zhang commented 3 years ago

Thanks for your contribution and the extra details.

david-cermak commented 3 years ago

@peter-pycom Sorry for very slow feedback. The change makes perfect sense, but still I'd like to understand a bit more why and when it happens. Would it be possible to share a packet log? Could be a binary log from stdout (e.g. using ESP_LOG_BUFFER_HEXDUMP() directly in the pppos_input()).

As you say, the issue happens when LTE reconnects, probably meaning the previous packet missed the trailing PPP_FLAG, but contained all other required fields including a valid checksum. Then, when disconnecting the in_tail and in_head got freed and zero'ed, so the next PPP_FLAG (expected at the beginning of each packet) would crash the firmware. The only trouble here is that connection and disconnection reset the pppos states and I couldn't find a place where the in_tail gets destroyed and pppos states were not reset. There might be a theoretical chance of the FCS being valid while parsing the PPP header, but this is very unlikely, even more unlikely if a PPP server is on the other side.

How often do you experience this issue? Is it always after reconnection?

david-cermak commented 3 years ago

@peter-pycom Any update on this issue? I feel quite reluctant to accept the changes, even if they make some sense. This is not reproducible, though, with current version of lwip and IDF.

Closing the PR for now, but please, share more details, if you will, or a packet capture would be helpful, too.

david-cermak commented 2 years ago

Hi @peter-pycom

Thanks again for sharing the fix. I'm reopening this, as I was able to reproduce the problem with help of reporters in https://github.com/espressif/esp-idf/issues/8300. The null dereference happens when we send a packet with correct start/end of frame and only zero's (escaped or not) in the body, e.g.

u8_t two_breaks[] = { 0x7e, 0, 0, 0x7e };
pppos_input(ppp, two_breaks, sizeof(two_breaks));

We'll proceed with this patch; I've also posted your patch upstream (https://savannah.nongnu.org/patch/index.php?10179) together with the related unit test.

Note, that what we mentioned above:

theoretical chance of the FCS being valid

Turned out to be a very practical/real-life situation, as the FCS was valid if all characters were 0's.

david-cermak commented 2 years ago

Cherry-picked as https://github.com/espressif/esp-lwip/commit/537c69d573f386befef0684c90cf5c9ffafd0510 Thanks @peter-pycom