Closed fabian18 closed 2 years ago
Hm this seems to be one of those problems that go away when setting ENABLE_DEBUG
to 1 :weary:
With ENABLE_DEBUG=1
in nanocoap/sock.c
, I get a failed assertion:
> client put fe80::3478:9157:9fd5:81c3%5 5683 /somepath somedata
2022-06-22 11:22:45,749 # client put fe80::3478:9157:9fd5:81c3%5 5683 /somepath somedata
2022-06-22 11:22:45,752 # nanocli: sending msg ID 1, 25 bytes
2022-06-22 11:22:45,755 # nanocoap: send 25 bytes (5 tries left)
2022-06-22 11:22:45,760 # nanocoap: waiting for response (timeout: 2092058 µs)
2022-06-22 11:22:45,762 # nanocoap: response code=204
2022-06-22 11:22:45,767 # nanocoap: waiting for response (timeout: 2085058 µs)
2022-06-22 11:22:45,773 # sys/net/gnrc/pktbuf/gnrc_pktbuf.c:93 => *** RIOT kernel panic:
2022-06-22 11:22:45,774 # FAILED ASSERTION.
2022-06-22 11:22:45,775 #
2022-06-22 11:22:45,776 # *** halted.
2022-06-22 11:22:45,776 #
2022-06-22 11:23:22,860 # Exiting Pyterm
assert(gnrc_pktbuf_contains(pkt));
Wait what :fearful:
Is that with the current master
?
When I try
% coap-server-notls -A fe80::90a7:a6ff:fe4b:2e32%tapbr0 -d 1
and
> client put fe80::90a7:a6ff:fe4b:2e32 5683 /somepath somedata
I get
client put fe80::90a7:a6ff:fe4b:2e32 5683 /somepath somedata
nanocli: sending msg ID 1, 25 bytes
nanocoap: send 25 bytes (5 tries left)
nanocoap: waiting for response (timeout: 2448721 µs)
nanocoap: response code=201
nanocoap: waiting for response (timeout: 2447721 µs)
nanocli: response Success, code 2.01, empty payload
Is that with the current master?
master 1638606c5c01b24d967e5630fa43efad5437add9
You have to send the request multiple times. Your last comment shows only 2.01
Created.
and with current master
branch, yes
I don't see this on native
, is this on stm32?
yes on stm32
Could also be an out of bounds write on the Ethernet driver, corrupting the pktbuf free list.
can you try with https://github.com/RIOT-OS/RIOT/pull/18227
can you try with https://github.com/RIOT-OS/RIOT/pull/18227
CFLAGS+=-DCONFIG_GNRC_PKTBUF_CHECK_USE_AFTER_FREE=1
makes no difference, the same assertion is triggered.
What 'works' surprisingly well is
--- a/sys/net/application_layer/nanocoap/sock.c
+++ b/sys/net/application_layer/nanocoap/sock.c
@@ -174,7 +174,7 @@ ssize_t nanocoap_sock_request_cb(nanocoap_sock_t *sock, coap_pkt_t *pkt,
case STATE_RESPONSE_RCVD:
case STATE_RESPONSE_OK:
if (ctx == NULL) {
- DEBUG("nanocoap: waiting for response (timeout: %"PRIu32" µs)\n",
+ printf("nanocoap: waiting for response (timeout: %"PRIu32" µs)\n",
_deadline_left_us(deadline));
}
const void *old_ctx = ctx;
However I don't see an immediate re-transmission in the error case, instead the node appears to wait for the timeout and never gets the response. I do see the response in Wireshark however when sniffing directly on the node's interface. (DOSE, with #18163)
So my issue was caused by a different thread being scheduled every 10 ms (and calling ztimer_set
each time), which in turn caused the UART RX interrupt to be delayed enough that it would only be executed after another byte was received already. The lost byte caused a CRC mismatch and the frame to be dropped, even though it was complete on the wire.
Adding the printf (or a busy loop, really) before calling sock_udp_recv_buf()
meant that the other thread did not get scheduled and the RX interrupts would occur in time. But the proper fix is of course to pause all other threads when downloading firmware updates (since improving RIOTs real-time grantees is certainly no easy task).
I assume the behavior you saw was then caused by issue described in #18416 - so can we close this now?
Good news: With #18416 no assertion fires anymore, and it is functioning very well.
But still, on the second attempt to upload a file, some blocks are transmitted twice immediately. With ENABLE_DEBUG=1
it is Block 1 and without it is Block 0.
So there still is something odd, but CoAP makes it work nevertheless.
so can we close this now?
I would keep it open, because it is only partially fixed.
fixed with #18969
Description
During the testing of #17962, I discovered that something in the transmission behaviour of
nanocoap
is not working correctly. Using e.g.tests/nanocoap_cli
:client put fe80::3478:9157:9fd5:81c3%5 5683 /somepath somedata
, to create a resource on a server. The first transmission succeeds, but if you repeat the command, a retransmission is also triggered immediately. Even though the first transmission succeeds, the first call tosock_udp_sendv()
innanocoap_sock_request_cb()
returns the expected value, but the message is not sent out (?). At least it does not appear in wireshark. Then the first call tosock_udp_recv_buf()
always returns-ETIMEDOUT
. But then in the second call tosock_udp_sendv()
, the message is sent out and appears in wireshark.Steps to reproduce the issue
coap-server
:coap-server -A fe80::3478:9157:9fd5:81c3%eth0 -d 1
.nucleo-f767zi
over Ethernet with my notebook, where the server is running.tests/nanocoap_cli
:CFLAGS+=-DCONFIG_GNRC_IPV6_NIB_ARSM=1 BOARD=miot-nucleo-f767zi make flash term
.client put fe80::3478:9157:9fd5:81c3%5 5683 /somepath somedata
and see the attached result.Expected results
The nanocoap client should not immediately trigger a retransmission, but wait for [
ACK_TIMEOUT
,ACK_TIMEOUT * RANDOM_FACTOR
], before it triggers the first retransmission.Actual results
Most of the time, the retransmission happens immediately, if the server is not very fast with the ACK. nanocoap_cli_put.pcapng.gz
Versions
Linux 5.18.3-arch1-1 (but that´s not relevant)