Closed HamzaHajeir closed 9 months ago
Here's an issue raised in parallel to mbedTLS, I'm not sure where the issue but suspecting either mbedTLS or LwIP ALTCP TLS port, LwIP comes with with less probability because I've did trace the memory alloc/free there.
Hi @HamzaHajeir
Looking at the the context, it looks related to memory leaking/corruption. Can you share your sdkconfig file? What are the H, M, S values? are those the Free heap and Minimum heap values?
Also can you share the free heap and minimum free heap sizes at the time of the issue?
Also, you mentioned that the previously connected devices continue the connection without any problem, Have you tried disconnecting those clients and trying to establish the connection once again after you have faced the above issue?
Hi @AdityaHPatwardhan,
Looking at the the context, it looks related to memory leaking/corruption. Can you share your sdkconfig file? What are the H, M, S values? are those the Free heap and Minimum heap values?
H: Free Heap, M: Max heap block, S: Stack watermark level. The heap values are measured with MALLOC_CAP_DEFAULT
capability.
Here's the sdkconfig, I'm now experimenting the same issue under ESP32S3 (This sdkconfig belongs to ESP32S3 environment): sdkconfig.txt
Also can you share the free heap and minimum free heap sizes at the time of the issue? I'll Insert that for the upcoming logs.
Also, you mentioned that the previously connected devices continue the connection without any problem, Have you tried disconnecting those clients and trying to establish the connection once again after you have faced the above issue? When that happened, even connected clients cannot reconnect after a disconnect (As Shift+F5 Refresh).
When the memory leak of 164 bytes happens, I've notices unchanged minimum heap block size of 2137063
(PSRAM is activated), while the heap decreases by 164 bytes per new connection/handshake.
I'm not sure why the minimum heap block is unchanged.
** Just got an instance where the leak is 176 bytes, the modules debug is off for current builds, please ask for it if needed.
Please get the readings for MALLOC_CAP_INTERNAL
,
In your sdkconfig
I can see that the option CONFIG_MBEDTLS_INTERNAL_MEM_ALLOC
has been set to 1 ( Which is the default behaviour for mbedTLS)
Minimum heap block notifies the lowest value for free_heap_size through its lifetime, so it can remain constant but would be lower than free_heap_size.
Here's a log with the changes. I've manually marked the reference heap point, which is the value prior to a new curl HTTPS request.
You can look for marks ===== REFRENCE POINT =====
and ==== AFTER SCAVENGE ==== (XXXXXX)
Wherein the scavenge occurs after the connection goes idle and got cleaned by the system internals.
device-monitor-230703-155511 copy.log
Note that the debug is turned off, and further I've manually removed unrelated logs (publishes and "H4P" logs).
Hi @HamzaHajeir I dont think the error stems from a code that is a part of esp-idf. We have not seen such error with any of the esp-idf examples. It is highly likely that the issue is caused by the patch that you have applied. I would recommend playing close attention to the part of code which is triggered when 1) New session is created 2) Old session is destroyed
I am closing the issue since it doesn't seem to be related to esp-idf. Please feel free to re-open in you still think it is an esp-idf issue and would be fixed by changing some part of code from esp-idf.
Thanks, Aditya
Hi Aditya,
I've read the code again, there's no session initiation in altcp_tls_mbedtls.c, nor in application code.
It's proven by logs (by inserted prints) that altcp_mbedtls_dealloc()
being called per connection close, which do call mbedtls_ssl_free()
.
As servers share the configurations, corresponding keys and certificated wont be freed unless the LISTEN connection gets closed.
It is highly likely that the issue is caused by the patch that you have applied.
Does esp-idf use LWIP_ALTCP, or have someone tested with it and confirmed the issue absence? If so, please inform with the exact version of LWIP_ALTCP used in tests.
Alternatively, can you confirm a successful memory leak test is done for TLS servers? Please inform with such projects.
Regards,
Hi @HamzaHajeir,
esp-idf example are tested against possible leaks in multiple areas. Though I won't say every leak always get caught. Just to check here, I have attached a log where I have attempted more than 200 connections to a TLS server created with https_server example. I dont observe any leak other than some memory changes due to socket buffers (which are released after some time hence might be confused with memory leak). leak_test.log I am re-opening the issue just in case, but I think if we are making changes to lwip and then some leak is occurring, wouldn't it be better to raise issue at https://github.com/lwip-tcpip/lwip (through appropriate means) In order to be sure about that, is it possible for you to provide some kind of reproducible code which does not use any external patch but reproduces memory leak?
I have noticed similar issues in my app. I host HTTPS server with WebSocket. At some times the API, just stops working. (Other logic from what I can tell is still fine) Seems to be random, but I have noticed if there are more tasks happening in parallel the higher risk of API breaking.
esp-tls-mbedtls: mbedtls_ssl_handshake returned -0x7780
esp_https_server: esp_tls_create_server_session failed
httpd: httpd_accept_conn: session creation failed
httpd: httpd_server: error accepting new connection
esp_https_server: performing session handshake
esp-tls-mbedtls: mbedtls_ssl_handshake returned -0x0050
esp_https_server: esp_tls_create_server_session failed
httpd: httpd_accept_conn: session creation failed
httpd: httpd_server: error accepting new connection
It works fine even after these errors
From what I can tell there is a pattern where it breaks
esp_https_server: performing session handshake
...
esp-tls-mbedtls: mbedtls_ssl_handshake returned -0x7780
esp_https_server: esp_tls_create_server_session failed
It doesn't log anything about httpd and at this point API is broken
I use v4.4
I switched to WolfSSL https://github.com/esp-rs/esp-idf-sys/issues/226 and I still get the same issue
Failed after (Interesting that with error code NO_PRIVATE_KEY = -317, /* need the private key */
)
E (158110) esp-tls-wolfssl: wolfSSL_accept returned -1, error code: -313
E (158110) esp_https_server: esp_tls_create_server_session failed
other times (Here it handlres it correctly without stucking)
I (565640) esp_https_server: performing session handshake
E (567170) esp-tls-wolfssl: wolfSSL_accept returned -1, error code: -308
E (567170) esp_https_server: esp_tls_create_server_session failed
W (567175) httpd: httpd_accept_conn: session creation failed
W (567180) httpd: httpd_server: error accepting new connection
I (567190) esp_https_server: performing session handshake
E (568720) esp-tls-wolfssl: wolfSSL_accept returned -1, error code: -313
E (568720) esp_https_server: esp_tls_create_server_session failed
W (568725) httpd: httpd_accept_conn: session creation failed
W (568730) httpd: httpd_server: error accepting new connection
I reproduce these errors by refreshing https endpoint in Firefox without accepted certificate
=== At one point I got
W (898340) httpd_ws: httpd_ws_get_frame_type: Failed to read header byte (socket FD invalid), closing socket now
API stopped working as well
I have attached a log where I have attempted more than 200 connections to a TLS server created with https_server example. I dont observe any leak other than some memory changes due to socket buffers (which are released after some time hence might be confused with memory leak). leak_test.log
I've checked it, it depends on latest ESP-IDF (v5.2) which is accompanied with major updated mbedTLS version (v3 instead of v2.28.3), so it won't be helpful to ensure the issue. (It could be any of ESP-IDF v4.4, mbedTLS 2.28.3 or lwip and furthers).
I have noticed similar issues in my app. I host HTTPS server with WebSocket. At some times the API, just stops working. (Other logic from what I can tell is still fine) Seems to be random, but I have noticed if there are more tasks happening in parallel the higher risk of API breaking.
esp-tls-mbedtls: mbedtls_ssl_handshake returned -0x7780 esp_https_server: esp_tls_create_server_session failed httpd: httpd_accept_conn: session creation failed httpd: httpd_server: error accepting new connection
esp_https_server: performing session handshake esp-tls-mbedtls: mbedtls_ssl_handshake returned -0x0050 esp_https_server: esp_tls_create_server_session failed httpd: httpd_accept_conn: session creation failed httpd: httpd_server: error accepting new connection
It works fine even after these errors
From what I can tell there is a pattern where it breaks
esp_https_server: performing session handshake ... esp-tls-mbedtls: mbedtls_ssl_handshake returned -0x7780 esp_https_server: esp_tls_create_server_session failed
It doesn't log anything about httpd and at this point API is broken
I use v4.4
Per error, check this line.
You might activate mbedTLS debug to see what does really happen after that point.
Per error, check this line.
You might activate mbedTLS debug to see what does really happen after that point.
Based that WolfSSL had the same issue as mbedtls I assume issue is in my code or esp-idf. Could be my issue is not related with your. You could try sending multiple failed SSL handshakes and see if that breaks stuff
Update: I set CONFIG_FREERTOS_UNICODE=y
and I can't seem to reproduce the issue anymore. Will try more, but it could be some race condition related
Warning (355257) httpd_ws: httpd_ws_get_frame_type: Failed to read header byte (socket FD invalid), closing socket now
still kills the API :/
Thanks for the update @reinismu @HamzaHajeir, I will take a detailed look once again and let you know. It could be that I might have missed something last time.
Thanks for the update @reinismu @HamzaHajeir, I will take a detailed look once again and let you know. It could be that I might have missed something last time.
Thanks for looking into it :)
I started to check this again and noticed that if I enable debug logs CONFIG_MBEDTLS_DEBUG=y
I don't get esp-tls-mbedtls: mbedtls_ssl_handshake returned -0x7780
and server doesn't get stuck anymore...
Maybe this could be related that I run at 240Mhz?
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240=y
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_MHZ=240
Update: Forgot to set WSS connection to ESP32S3. With websocket connection active it still silently crashed
Steps I did:
Ahh this bug is funny... Now I added a bit more logging around the suspected place and if I turn verbose logging for "httpd_sess" (esp_idf_svc::log::set_target_level("httpd_sess", log::LevelFilter::Trace);
) it doesn't crash anymore. Now I'm not even sure how to dig deeper.
Maybe someone has had similar issues?
I assume it's not compiler related as debugging code is there, just I don't log it. Maybe some timing thing somewhere
Update:
Managed to get logs to work. Maybe just a coincidence. It seems that my code gets stuck at hd->config.close_fn(hd, session->fd);
From what I know this callback is set in https://github.com/esp-rs/esp-idf-svc/blob/master/src/http/server.rs#L264
Will investigate further. Seems that issue could be related to it
Hi @HamzaHajeir, I have tested the functioning of esp_https_server at my end. I was not able to reproduce this issue. @reinismu were you able to reproduce this once again?
I did solve my issue. In the end it was my code. It did deadlock in some cases. ESP websockets were not threadsafe, so I had to change how I used them.
Hi @AdityaHPatwardhan, I'm a way of HTTPS webserver really, but I've just launched a normal test (trying to stress manually, debug off), I'll update with further notes.
@AdityaHPatwardhan The testbed which run for ~30 hours started perfectly but ended up with serious issues:
WiFi.begin(xxx,yyy)
. (Might be an issue related to Arduino core).The performance was in degradation over time, At first it started perfectly with 6 websockets secure (wss) clients simultaneously, this number has degraded over time due to memory leaks in the internal memory (For the certificates to take place), until there was only one client can connect at a time.
It's worth noting:
If you believe it's not an issue in esp-idf, mbedtls port, or mbedtls itself, please feel free to close.
Hi @HamzaHajeir, I tried the https server again with using esp-idf as baseline. I did not find any visual leaks that might explain the issue. I am closing the issue since I am unable to reproduce the issue at my end and my own tests did not give any indication of memory leak. Thanks for the issue. Please feel free to re-open if you find any additional material with which I can reproduce the issue at my end.
Answers checklist.
IDF version.
release/v4.4
Operating System used.
Windows
How did you build your project?
Other (please specify in More Information)
If you are using Windows, please specify command line type.
None
Development Kit.
ESP32DevKit1
Power Supply used.
USB
What is the expected behavior?
I host TLS webserver using LwIP ALTCP TLS - MBEDTLS port, I expect every client (browser) to be able to connect anytime.
In essence, It works correctly to some point, correct behaviour includes making successful handshakes and communication. The problem starts after some amount of time/process, in which causes to deny any new connections, even from browsers were already connected by the means of a reload button.
Already connected connections prior to this point continue to communicate successfully afterwards.
The error code returned by the
mbedtls_ssl_handshake
is fixed at -17280 (-0x4380) (MBEDTLS_ERR_RSA_VERIFY_FAILED).I've noticed that some times the preceeding handshake's error code is -17168 (-0x4310), which I've tried to find the represented meaning, but failed to, nearest assumption is
MBEDTLS_ERR_RSA_PRIVATE_FAILED | MBEDTLS_ERR_MPI_NEGATIVE_VALUE
.Is there any hint for what's going wrong? With thanks,
What is the actual behavior?
Log shows:
Or in the other described case:
Steps to reproduce.
Here's the running ALTCP MBEDTLS Code, pointing to the handshake API call:
https://github.com/HamzaHajeir/esp-lwip/blob/2.1.2-esp-patch/src/apps/altcp_tls/altcp_tls_mbedtls.c#L278
Debug Logs.
More Information.
Building was done using espressif/esp32-arduino-lib-builder tools, with patched LwIP code (linked above). mbedTLS version is 2.28.3
Meanwhile this issue, the MCU is able to reconnect to a MQTT server by TLS with successful handshakes.
Restarting the webserver with fresh SSL context allows for successful handshakes again.
My investigation of a very long log confirms that this problem leaks memory too.