eclipse-threadx / netxduo

Eclipse ThreadX - NetXDuo is an advanced, industrial-grade TCP/IP network stack designed specifically for deeply embedded real-time and IoT applications
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/netx-duo/index.md
MIT License
230 stars 131 forks source link

Addon : web/tcpserver configured with TLS #158

Closed EdouardMALOT closed 1 year ago

EdouardMALOT commented 1 year ago

Hi,

I have an HTTP/HTTPs server based on code "addons/web/nx_tcpserver.c". With HTTP I didn't noticed any issue, but with TLS enable I had some rare issue with specific paquet size.

The issue comes from the condition below : https://github.com/azure-rtos/netxduo/blob/567cf0bb6927704db72543618e2f732f70106a86/addons/web/nx_tcpserver.c#L1115

The value of "nx_tcp_socket_receive_queue_count" can be "0" at the moment of execution of the if() condition and become "1" just after.

The work-around I found was to forced the if() condition to TRUE on TLS session : "if((socket_ptr -> nx_tcp_socket_receive_queue_count) || (server_ptr -> nx_tcpserver_sessions[i].nx_tcp_session_using_tls))"

I am not 100% sure it is the correct way to fix it. Can I have a feedback ?

wenhui-xie commented 1 year ago

@DoudFPV Are there multiple clients connecting to the HTTPS server in your test environment?

EdouardMALOT commented 1 year ago

No, the size of tls_sessions_buffer[] allows only one client to be connected at a time.

wenhui-xie commented 1 year ago

@DoudFPV Could you print the socket_ptr -> nx_tcp_socket_receive_queue_count in this function: https://github.com/azure-rtos/netxduo/blob/567cf0bb6927704db72543618e2f732f70106a86/addons/web/nx_tcpserver.c#L796.

The normal process when receiving a new packet is as follows:

  1. IP thread increases socket_ptr -> nx_tcp_socket_receive_queue_count;
  2. IP thread calls the receive callback function _nx_tcpserver_data_present() and sets the NX_TCPSERVER_DATA event;
  3. TCPServer thread processes the NX_TCPSERVER_DATA event and calls the _nx_tcpserver_data_process() function.

So If there is only one client, the socket_ptr -> nx_tcp_socket_receive_queue_count should not be 0 when the program runs to this line: https://github.com/azure-rtos/netxduo/blob/567cf0bb6927704db72543618e2f732f70106a86/addons/web/nx_tcpserver.c#L1115

EdouardMALOT commented 1 year ago

Hi @wenhui-xie

Thanks for you reply.

I print the value of nx_tcp_socket_receive_queue_count with the code below :

image

With the HTTPs POST below I got :

image image

Note : with HTTPs, TLS record size is 16Ko. Does "nx_tcp_socket_receive_queue_count"/_nx_tcpserver_data_present() is incremented for each received IP packet or does it incremented/called when all the TCP / TLS packet is received. ? (later in _api_server_receive_data() I call nx_secure_tls_session_receive() which seems to wait the reception of the full TLS records).

wenhui-xie commented 1 year ago

Hi @DoudFPV, nx_tcp_socket_receive_queue_count is increased for each received valid TCP packet not the full TLS records. My last reply didn't consider the situation that there is another thread waiting for the packet. In this situation, after the nx_tcp_socket_receive_queue_count is increased(0 -> 1), the new packet will be removed from the receive_queue and returned to the waiting thread and then the nx_tcp_socket_receive_queue_count will be decreased(1 -> 0) before calling the receive callback function _nx_tcpserver_data_present().

From the debug info you send, the server was timeout at 26, and from 16 to 26, no packet was received(no queue_count printed). Does the 16/26 mean the seconds? If it is, do you set the server timeout to 10s? If no packet was received in the timeout period, the server will call the timeout process function. image

Another question, did you implement the http/https server by yourself, and didn't use the nx_web_http_server.c?

EdouardMALOT commented 1 year ago

@wenhui-xie Yes the [16] to [26] is the time in seconds, and we have a 10sec timeout.

I did implement http/https myself using nx_tcpserver.h. Note : nx_web_http_server.c doesn't seems to have any references to "nx_tcp_socket_receive_queue_count". Does it really necessary ?

wenhui-xie commented 1 year ago

@DoudFPV The receiving logic is implemented in HTTP/HTTPS, so I want to ensure we are on the same page.

However, from the log you gave, the behavior is expected that the server timeout the client because it didn't receive any data in 10s. So you can try to increase the timeout.

EdouardMALOT commented 1 year ago

@wenhui-xie Yes the logic is the same for HTTP/HTTPs on nx_web_http_server.c (sorry my "Note :" was wrong).

The last "queue_count=1" happend after _api_server_connection_timeout() which close the TLS connection by calling nx_secure_tls_session_end(). In response the client respond with a "TLS close ACK" (Alert (Level: Warning, Description: Close Notify))

image

So the issue I not related with the timeout value.

Below an other log : I simply add two msg_warning (printf() like) into "_nx_tcpserver_data_process()" :

image image

As we can see, "_nx_tcpserver_data_process()" is called 3 times : 1/ with "nx_tcp_socket_receive_queue_count = 0" (So server_ptr -> nx_tcpserver_receive_data() wasn't called). 2/ with "nx_tcp_socket_receive_queue_count > 0" (So server_ptr -> nx_tcpserver_receive_data() was called and the server accept connection, cf debug traces). 3/ with "nx_tcp_socket_receive_queue_count = 0" (So server_ptr -> nx_tcpserver_receive_data() wasn't called).

So the issue is _nx_tcpserver_data_process() isn't called anymore with "nx_tcp_socket_receive_queue_count > 0" => the server never get the 2nd TLS records

Note : If you try with the "nx_web_http_server" with TLS enable, and make a POST with a payload of 16385 bytes you will probably have the same issue.

EdouardMALOT commented 1 year ago

Note : If I comment the condition"if(socket_ptr -> nx_tcp_socket_receive_queue_count)" everything works well. Why do we need to check nx_tcp_socket_receive_queue_count ?

wenhui-xie commented 1 year ago

In the logic of TCP data processing, if there is a thread waiting for the packet, it will return the packet to the waiting thread directly. If the socket_ptr -> nx_tcp_socket_receive_queue_count is 0, it means that the waiting thread already gets the packet, so there is no need to call server_ptr -> nx_tcpserver_receive_data().

From the packet trace, the server timeout the client after 10s, which is normal behavior.

If you didn't receive all the TLS records, maybe there are some issues in the receiving logic. Comment the condition "if(socket_ptr -> nx_tcp_socket_receive_queue_count)" will trigger extra calls of server_ptr -> nx_tcpserver_receive_data(), and also reset the timeout counter. So it may fix the issues of receiving and timeout by coincidence.

EdouardMALOT commented 1 year ago

Okay, that makes sense :

In my server receive logique I call "nx_secure_tls_session_receive()" only once at the beginning of the nx_tcpserver_receive_data() callback. (in "nx_web_http_server.c" it wait for another packet).

image

But according to your explanation I don't understand 100% : On step 3/ which thread get the packet? nothing is waiting the packet.

Note : The timeout is done by my server logic.

EdouardMALOT commented 1 year ago

Anyway, I realized that _nx_tcpserver_data_process() is not called as I had imagined (Only for server socket, for each TCP packet).

If I wait for another packet until the full http request is received (as in "nx_web_http_server.c"), it works.

@wenhui-xie Thanks for your help.