eclipse-threadx / netxduo

Eclipse ThreadX - NetXDuo is an advanced, industrial-grade TCP/IP network stack designed specifically for deeply embedded real-time and IoT applications
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/netx-duo/index.md
MIT License
230 stars 131 forks source link

High-speed transfer hang on mutex #272

Open leobbditestcom opened 1 month ago

leobbditestcom commented 1 month ago

In an embedded system sending binary data at about 7 Mbits/second over a 100BaseT interface TCP socket, NetX Duo hangs after a few seconds, waiting on mutex nx_ip_protection. Addition of some test code indicated that nx_ip_protection was acquired in nx_tcp_socket_send_internal(), line 803, during a call from the user application of nx_tcp_socket_send():

            /* Place protection while we check the sequence number for the new TCP packet.  */
            tx_mutex_get(&(ip_ptr -> nx_ip_protection), TX_WAIT_FOREVER);

            /* Determine if the sequence number is the same.  */
            if (sequence_number != socket_ptr -> nx_tcp_socket_tx_sequence)
            {

After line 912, the code calls either _nx_ip_packet_send() or _nx_ipv6_packet_send(), which can apparently re-request ownership of nx_ip_protection. The following changes were made, to make nx_ip_protection available during execution of these routines, without violating the expected mutex state in the code which follows:

            /* Send the TCP packet to the IP component.  */
#ifndef NX_DISABLE_IPV4
            if (socket_ptr -> nx_tcp_socket_connect_ip.nxd_ip_version == NX_IP_VERSION_V4)
            {

>>>            /* Release the protection.  */
>>>             tx_mutex_put(&(ip_ptr -> nx_ip_protection));

                _nx_ip_packet_send(ip_ptr, send_packet,
                                   socket_ptr -> nx_tcp_socket_connect_ip.nxd_ip_address.v4,
                                   socket_ptr -> nx_tcp_socket_type_of_service,
                                   socket_ptr -> nx_tcp_socket_time_to_live,
                                   NX_IP_TCP,
                                   socket_ptr -> nx_tcp_socket_fragment_enable,
                                   socket_ptr -> nx_tcp_socket_next_hop_address);

>>>             /* Reacquire IP structure. */
>>>             tx_mutex_get(&(ip_ptr -> nx_ip_protection), TX_WAIT_FOREVER);
            }
#endif /* !NX_DISABLE_IPV4  */

#ifdef FEATURE_NX_IPV6
            if (socket_ptr -> nx_tcp_socket_connect_ip.nxd_ip_version == NX_IP_VERSION_V6)
            {

>>>             /* Release the protection.  */
>>>             tx_mutex_put(&(ip_ptr -> nx_ip_protection));

                /* Ready to send the packet! */
                _nx_ipv6_packet_send(ip_ptr,
                                     send_packet,
                                     NX_PROTOCOL_TCP,
                                     send_packet -> nx_packet_length,
                                     ip_ptr -> nx_ipv6_hop_limit,
                                     socket_ptr -> nx_tcp_socket_ipv6_addr -> nxd_ipv6_address,
                                     socket_ptr -> nx_tcp_socket_connect_ip.nxd_ip_address.v6);

>>>             /* Reacquire IP structure. */
>>>             tx_mutex_get(&(ip_ptr -> nx_ip_protection), TX_WAIT_FOREVER);
            }
#endif /* FEATURE_NX_IPV6 */

The changes do not prevent the hang from occurring, however. But if the two tx_mutex_get() calls are removed, the hang no longer occurs.

The changes are patterned after a similar tx_mutex_put()/tx_mutex_get() sequence, already present in the code at line 566:

                /* Release the protection.  */
                tx_mutex_put(&(ip_ptr -> nx_ip_protection));

                /* Obtain a new segmentation. */
                ret = _nx_packet_allocate(pool_ptr, &send_packet,
                                          data_offset, wait_option);

                if (ret != NX_SUCCESS)
                {

                    /* Restore preemption? */
                    if (preempted == NX_TRUE)
                    {

                        /*lint -e{644} -e{530} suppress variable might not be initialized, since "old_threshold" was initialized when preempted was set to NX_TRUE. */
                        tx_thread_preemption_change(_tx_thread_current_ptr, old_threshold, &old_threshold);
                    }

                    /* Packet allocate failure. Return.*/
                    return(ret);
                }

                /* Regain exclusive access to IP instance. */
                tx_mutex_get(&(ip_ptr -> nx_ip_protection), TX_WAIT_FOREVER);