eclipse-threadx / netxduo

Eclipse ThreadX - NetXDuo is an advanced, industrial-grade TCP/IP network stack designed specifically for deeply embedded real-time and IoT applications
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/netx-duo/index.md
MIT License
230 stars 131 forks source link

Improper handled of out of order packets in case Application connection restarts in multiple loop #260

Open jagudla opened 3 months ago

jagudla commented 3 months ago

we are running MQTT client application as mentioned in below , server : Mosiquito broker

1.WLAN AP connect

  1. MQTT Client Init
  2. MQTT Connect
  3. MQTT Subscribe
  4. MQTT Publish
  5. MQTT Unsubscribe
  6. MQTT Disconnect
  7. MQTT Destroy
  8. WLAN AP Disconnect
  9. Repeat steps in loop from step 1 to 9.

We are observed MQTT connection failed with connection time out. it is occurs due to out of order packet received in tcp queue from the previous connection, this means socket sequence number is not matching with incoming tcp packet sequence number so packet is not given to the application layer and application timeout happened.

Below we are attaching the wireshark captures and debug prints for your reference :

bbf3 wireshark captures jtag_logs.txt

1) we restarting MQTT connection with same port and ip address. 1) From the above capture in the packet number :6242 we are sending the MQTT connect cmd with seq_no:4273989310 and ack_no:1428468738 2)server had responded with connack in the packet number:6243 with seq_no:1428468738 and ack_no:4273989354 this should be the expected packet for the current MQTT connection successful. 3)But due to out of order packet present in tcp rx, queued packet number :6161 seq_no:1426907804 and ack _no:2142349211 present in the queue condition check got failed and application fives time out error.

Below is the code where problem occurs: File name: nx_tcp_socket_state_data_check.c API: nx_tcp_socket_state_data_check

/ Pickup the tail pointer of the receive queue. / search_ptr = socket_ptr->nx_tcp_socket_receive_queue_tail;

note: For a new connection all the packets in receeive_queue should be null but in current case out of order packer from the previous connection present in the queue search_ptr should be true.

/ Check to see if the tail pointer is part of a contiguous stream. / if (search_ptr) {

/* Setup a pointer to header of this packet in the sent list.  */
search_header_ptr = (NX_TCP_HEADER *)search_ptr->nx_packet_prepend_ptr;

/* Determine the size of the search TCP header.  */
search_header_length = (search_header_ptr->nx_tcp_header_word_3 >> NX_TCP_HEADER_SHIFT) * sizeof(ULONG);
/* Now see if the current sequence number accounts for the last packet.  */
search_end_sequence =
  search_header_ptr->nx_tcp_sequence_number + search_ptr->nx_packet_length - search_header_length;

note: required sequence number updated in search_end_sequence.

} else {

/* Set the sequence number to the socket's receive sequence if there isn't a receive 
       packet on the queue.  */
search_end_sequence = socket_ptr->nx_tcp_socket_rx_sequence;

}

/ Determine if we have a simple case of TCP data coming in the correct order. This means the socket's sequence number matches the incoming packet sequence number and the last packet's data on the socket's receive queue (if any) matches the current sequence number. / if ((tcp_header_ptr->nx_tcp_sequence_number == socket_ptr->nx_tcp_socket_rx_sequence) && (search_end_sequence == socket_ptr->nx_tcp_socket_rx_sequence)) {

note: this is check is failing due to search_end_sequence and socket_ptr-> nx_tcp_socekt_rx_sequence so packet is not go in ready state to application.

/* Yes, this is the simple case of adding receive packets in sequence.  */

/* Mark the packet as ready. This is done to simplify the logic in socket receive.  */
packet_ptr->nx_packet_queue_next = (NX_PACKET *)NX_PACKET_READY;
/* Place the packet on the receive queue.  Search pointer still points to the tail packet on
       the queue.  */
jagudla commented 2 months ago

Hi Team, Please provide any update on this ticket.

jagudla commented 1 month ago

Hi Team, Please provide any update on this ticket.

yuxinzhou5 commented 1 month ago

(update: looks like NetX is running as a client. So make sure the port number is randomly generated. Between restarts, it doesn't get the same port number.)

As you debug this issue, pay attention to the TCP sequence number used on the NetX side. Between two TCP sessions, the sequence number shall be random. This way, when the remote sends data to NetX Device, the sequence number won't be in range for the new TCP connection, and NetX can reject the packets (from the previous session).

TCP initial sequence number is generated by calling "NX_RAND()". Make sure NX_RAND (defined in nx_api.h) is tied to a true random number source.