Closed gregtwice closed 1 year ago
There is a race condition fix as describe in #43. Could you upgrade to latest version and see if this issue can be solved?
Thanks, the issue is solved
I have similar problem and upgrading to latest version (v6.2.1) won't solve the issue. Is there anything I should try?
Target board: Renesas EK-RA6M5 Azure RTOS NetX Duo version: v6.2.1 Toolchain: GCC ARM Embedded Toolchain Version: 10.3.1.20210824
Best regards.
After further testing, it appears the issue is still there, still in _nx_tcp_socket_receive_queue_flush
but the adress accessed is 0xEEEEEEEE + 0x20
.
Could you share a packet trace capture by wireshark or tcpdump? I don't have environment to capture now. I can provide it when ready.
Can this issue be reproduced easily? If so, please describe it. At first I experienced this issue when transmitting data to server by FTP. When the server got heavy load (congestion) it happened. After that, I found out disconnecting <-> connecting ethernet cable quickly while transmitting tcp data, can reproduce the issue. With this method, it can reproduced in few minutes.
Did you encounter this issue from beginning of using NetX Duo or after upgrade? From beginning. Start using from v6.1.11.
Did you port the network driver, or it is provided by NXP/Renesas/MSFT? I use network driver provided by Renesas.
I'm not sure if we are looking at the same version of Renesas network driver. Could do a search of nx_packet_release
in rm_netxduo_ether.c? Where TX BD is released, replace the function call with nx_packet_transmit_release
. I suspend the issue is caused by multiple releases on the same packet.
I replaced nx_packet_release
to nx_packet_transmit_release
inside the file /ra/fsp/src/rm_netxduo_ether.c .
6 replaced.
The result was same. The issue can be reproduced.
Could you compile your project with NX_ENABLE_PACKET_DEBUG_INFO
defined. When you hit the hard fault, add socket_ptr -> nx_tcp_socket_transmit_sent_head
to watch list. Follow the link of nx_packet_union_next.nx_packet_tcp_queue_next
till 0xaaaaaaaa
. For the last packet, please share the value of nx_packet_debug_file
, nx_packet_debug_line
and `nx_packet_debug_thread
.
Conclusion first, the issue was my fault.
2 threads were using different ftp client while sharing the same packet pool. There is no problem just shareing it. When packet loss happen and error return from nx_ftp_client_xxx(), I was recreating (delete & create) packet pool as a charm.
So, packet loss happen on both thread, and each thread disconnect and recreate packet pool, there was a potential to access deleted (or initialized) packet pool as it has packets.
I removed recreating packet pool process and the issue won't be reproduced.
NX_ENABLE_PACKET_DEBUG_INFO
This really helped. Thank you.
@gregtwice I'm not sure this is the same situation. I hope it will help you.
Glad to know your issue is resolved, @TEcwbg! I will keep this issue open for a while in case @gregtwice still have questions.
Closing.
To reproduce :
When looking at the disassembly of my executable, I see that instruction
0x22710
isLDR r6 [r0, #32]
. Looking at my stack frame I see thatr0
equals0xAAAAAAAA
which is the value for an allocated packet. It would seem that the instruction is attempting to load the value atAAAAAAADC
which is not 4 bytes aligned, resulting in a crash.Have you ever seen this bug ? If so are there steps to fix it ? Or should I update NetX ?
Best regards,