eclipse-threadx / netxduo

Eclipse ThreadX - NetXDuo is an advanced, industrial-grade TCP/IP network stack designed specifically for deeply embedded real-time and IoT applications
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/netx-duo/index.md
MIT License
230 stars 131 forks source link

NetX Duo Socket Error NX_NOT_CONNECTED due to dropped SYN-ACKs #270

Open codemonkeyboris opened 2 months ago

codemonkeyboris commented 2 months ago

Describe the bug We have a S7G2 Device connected to a Linux PC's ethernet. It is at static ip: 192.168.1.21. The Linux PC has two network interfaces, the WiFi on 10.202.16.196 and the eithernet on 192.168.1.1. The S7G2 device serves a webpage and when we pull it up on the Linux PC, we see the following traffic

image

And this appears to be glitching on the webpage due to the timeout and reset. After that the traffic looks fine and webpage works fine because all the traffice has the source from 192.168.1.1 on the same subnet with the S7G2 deivce.

Please also mention any information which could help others to understand the problem you're facing:

We did some digging and the problem appears to be when nx_tcp_server_socket_accept is called, we get a error return code: NX_NOT_CONNECTED (0x38). And this is happening when the WiFi interface sends SYN packets to the device from 10.202.16.196. The SYN-ACKs got dropped nx_ip_packet_send because nx_tcp_socket_next_hop_address == 0.

We further found that the we did not configure the gateway in the S7G2 device, which caused the next hop to be empty. If gateway is set to 192.168.1.1 on the device, we won't have this problem. However, we did not have this issue when we were using NetX (not Duo).

So I compared the two and found that in NetX, in nx_ycp_packet_process.c, when nx_ip_route_find is called, it had this piece of code to capture the error and assign the source IP to the next hop:

                    if (_nx_ip_route_find(ip_ptr, source_ip, &socket_ptr -> nx_tcp_socket_connect_interface,
                                          &socket_ptr -> nx_tcp_socket_next_hop_address) != NX_SUCCESS)
                    {
                        /* Cannot determine how to send packets to this TCP peer.  Since we are able to
                           receive the syn, use the incoming interface, and send the packet out directly. */

                        socket_ptr -> nx_tcp_socket_next_hop_address = source_ip;
                    }

And the above code does not exist in NetX Duo anymore.

                    if (packet_ptr -> nx_packet_ip_version == NX_IP_VERSION_V4)
                    {

                        /* Assume the interface that receives the incoming packet is the best interface
                           for sending responses. */
                        socket_ptr -> nx_tcp_socket_connect_interface = interface_ptr;
                        socket_ptr -> nx_tcp_socket_next_hop_address = NX_NULL;

                        /* Set the next hop address.  */
                        _nx_ip_route_find(ip_ptr, *source_ip, &socket_ptr -> nx_tcp_socket_connect_interface,
                                          &socket_ptr -> nx_tcp_socket_next_hop_address);

I put it back and it solved the problem I described above. So I would like to understand the idea behind this to remove it in NetX Duo code. Would it be safe to put it back? What kind of testing should we do to make sure putting this back does not break other functionalities?

Please advise and let me know if you need more information.

Thanks!