RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.97k stars 1.99k forks source link

benchmark_udp: hammering with low interval causes issues #16808

Open kaspar030 opened 3 years ago

kaspar030 commented 3 years ago

Description

I'm trying benchmark_udp on an nrf52840dk over usbus_cdc_ecm. I'm seeing some reliability issues with high load.

With down to -i 30, everything is fine. From -i 29, the shell becomes unresponsive, but the node still pings and I can run more benchmark_udp (restart host tool). From -i 14, the node seems to freeze. Fixed with #16831.

Steps to reproduce the issue

compile tests/usbus_cdc_ecm with benchmark_udp:

BOARD=nrf52840dk USEMODULE="benchmark_udp" make -Ctests/usbus_cdc_ecm clean all flash -j8

in riots shell:

bench_start <host-link-local-addr> 12345,

then on host:

./benchmark_server -i 14 :: 12345

Expected results

Shell and node stay responsive.

Actual results

Shell freeze or (seemingly) full node freeze.

Versions

Compiled on arch with gcc version 11.2.0.

kaspar030 commented 3 years ago

I'm seeing similar issues on samr21-xpro. I think bench_udp sets too small delays at some point, causing the r/x thread to always return from sock_recv() with ETIMEOUT. But I'm not sure.

kfessel commented 3 years ago

maybe this is a misbehavior of xtimer_set in gnrc_sock_recv (or it is holding it wrong), if the timeout is short it might be smaller than XTIMER_BACKOFF this will lead to xtimer_spin which in turn will not give way for the recv to proccede and will therfor fire before any further processing could be done in gnrc_sock_recv

kfessel commented 3 years ago

@kaspar030: maybe you can retest with #16831

kaspar030 commented 3 years ago

@kaspar030: maybe you can retest with #16831

That PR fixes the second issue (seemingly freezing node)! Thanks.

The shell stays unresponsive, though. @benpicco probably the _listen_thread is still calling sock_udp_recv() too fast. Is the timeout actually necessary for anything else but evaluating the "running" variable? Maybe something fixed (like, 100ms) would do fine?

kfessel commented 3 years ago

i think there is still part of this open, but it might be that the rest is just not enough priority for the shell thread

kfessel commented 3 years ago

the rest seems to be a priority + xtimer_usleep(delay_us) problem in _send_thread in sys/test_utils/benchmark_udp/benchmark_udp.c:137

kfessel commented 3 years ago

If the delay is less than XTIMER_BACKOFF the _send_thread through xtimer_delay -> will spin and take every cpu cycle from the lower priority shell

Decreasing the priority of the send_thread lower than shell will (at least on native) seem to let the shell take all the cycles as getchar does not switch the thread and is blocking RIOT #16834

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.