adjacentlink / emane

Distributed wireless network emulation framework
Other
129 stars 37 forks source link

Unexpected ping times with RF Pipe model #210

Open mszulcz-mitre opened 2 years ago

mszulcz-mitre commented 2 years ago

Dear AdjacentLink,

I'm running an emulation with 2 LXCs, each of which has 1 NEM using an RF Pipe model. I set the delay in each RF Pipe MAC layer to 1 second. During the emulation, I ssh into the 1st LXC (node-1) and ping the radio interface in the 2nd LXC (node-2). At first, the ping times are about 2 seconds, which is what I expect. After about 10 pings, though, the ping times steadily increase until the radio in the 2nd LXC becomes unreachable. Could you help me understand this behavior? I tried looking through the EMANE wiki and manuals, but maybe I'm not seeing the relevant parts.

The configuration files for the run are attached. 2-rfpipe.zip

Here's an example of the output from ping: root@node-1:~# ping radio-2 PING radio-2 (10.100.0.2) 56(84) bytes of data. 64 bytes from radio-2 (10.100.0.2): icmp_seq=1 ttl=64 time=2002 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=2 ttl=64 time=2003 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=3 ttl=64 time=2004 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=4 ttl=64 time=2002 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=5 ttl=64 time=2002 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=6 ttl=64 time=2002 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=7 ttl=64 time=3002 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=8 ttl=64 time=4003 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=9 ttl=64 time=5003 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=10 ttl=64 time=5004 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=11 ttl=64 time=5003 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=12 ttl=64 time=5004 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=13 ttl=64 time=5005 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=14 ttl=64 time=5005 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=15 ttl=64 time=6005 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=16 ttl=64 time=7005 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=17 ttl=64 time=8006 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=18 ttl=64 time=11006 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=19 ttl=64 time=11006 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=20 ttl=64 time=11006 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=21 ttl=64 time=11007 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=22 ttl=64 time=11007 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=23 ttl=64 time=11008 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=24 ttl=64 time=11007 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=25 ttl=64 time=11008 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=26 ttl=64 time=11007 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=27 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=28 ttl=64 time=11010 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=29 ttl=64 time=11010 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=30 ttl=64 time=11010 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=31 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=32 ttl=64 time=11010 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=33 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=34 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=35 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=36 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=37 ttl=64 time=11010 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=38 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=39 ttl=64 time=11009 ms From radio-1 (10.100.0.1) icmp_seq=47 Destination Host Unreachable From radio-1 (10.100.0.1) icmp_seq=48 Destination Host Unreachable From radio-1 (10.100.0.1) icmp_seq=49 Destination Host Unreachable From radio-1 (10.100.0.1) icmp_seq=50 Destination Host Unreachable

jgiovatto commented 2 years ago

Hi

I'm pretty sure the linux ping timeout is 1 second, at least it in on my FC34 box.

Your RTT is 2 seconds, so pings are starting to pile up at the transmitter along with ARP's.

to verify try:   strace -f --trace=network ping 10.100.0.2  2<&1 | grep  "sendto"

see: man -a ping

 -W timeout            Time to wait for a response, in seconds. The option affects only            timeout in absence of any responses, otherwise ping waits for two            RTTs. Real number allowed with dot as a decimal separator            (regardless locale setup). 0 means infinite timeout. Joe

On 3/23/22 19:25, mszulcz-mitre wrote:

Dear AdjacentLink,

I'm running an emulation with 2 LXCs, each of which has 1 NEM using an RF Pipe model. I set the delay in each RF Pipe MAC layer to 1 second. During the emulation, I ssh into the 1st LXC (node-1) and ping the radio interface in the 2nd LXC (node-2). At first, the ping times are about 2 seconds, which is what I expect. After about 10 pings, though, the ping times steadily increase until the radio in the 2nd LXC becomes unreachable. Could you help me understand this behavior? I tried looking through the EMANE wiki and manuals, but maybe I'm not seeing the relevant parts.

The configuration files for the run are attached. 2-rfpipe.zip https://github.com/adjacentlink/emane/files/8337234/2-rfpipe.zip

Here's an example of the output from ping: @.***:~# ping radio-2 PING radio-2 (10.100.0.2) 56(84) bytes of data. 64 bytes from radio-2 (10.100.0.2): icmp_seq=1 ttl=64 time=2002 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=2 ttl=64 time=2003 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=3 ttl=64 time=2004 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=4 ttl=64 time=2002 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=5 ttl=64 time=2002 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=6 ttl=64 time=2002 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=7 ttl=64 time=3002 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=8 ttl=64 time=4003 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=9 ttl=64 time=5003 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=10 ttl=64 time=5004 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=11 ttl=64 time=5003 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=12 ttl=64 time=5004 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=13 ttl=64 time=5005 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=14 ttl=64 time=5005 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=15 ttl=64 time=6005 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=16 ttl=64 time=7005 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=17 ttl=64 time=8006 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=18 ttl=64 time=11006 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=19 ttl=64 time=11006 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=20 ttl=64 time=11006 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=21 ttl=64 time=11007 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=22 ttl=64 time=11007 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=23 ttl=64 time=11008 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=24 ttl=64 time=11007 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=25 ttl=64 time=11008 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=26 ttl=64 time=11007 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=27 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=28 ttl=64 time=11010 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=29 ttl=64 time=11010 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=30 ttl=64 time=11010 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=31 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=32 ttl=64 time=11010 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=33 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=34 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=35 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=36 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=37 ttl=64 time=11010 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=38 ttl=64 time=11009 ms 64 bytes from radio-2 (10.100.0.2): icmp_seq=39 ttl=64 time=11009 ms From radio-1 (10.100.0.1) icmp_seq=47 Destination Host Unreachable From radio-1 (10.100.0.1) icmp_seq=48 Destination Host Unreachable From radio-1 (10.100.0.1) icmp_seq=49 Destination Host Unreachable From radio-1 (10.100.0.1) icmp_seq=50 Destination Host Unreachable

— Reply to this email directly, view it on GitHub https://github.com/adjacentlink/emane/issues/210, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUAX5TDY5PKY2QBJ434DCTVBOR5PANCNFSM5RPOYHLQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

mszulcz-mitre commented 2 years ago

Hi jgiovatto,

Thank you for your quick reply! I've got a few follow-up comments and questions:

jgiovatto commented 2 years ago

Hmm, the -W option does not seem to delay things as I would have thought by throttling the sender.

ping -W 0 172.16.1.99 PING 172.16.1.99 (172.16.1.99) 56(84) bytes of data. 64 bytes from 172.16.1.99: icmp_seq=1 ttl=64 time=2000 ms 64 bytes from 172.16.1.99: icmp_seq=2 ttl=64 time=2952 ms 64 bytes from 172.16.1.99: icmp_seq=3 ttl=64 time=3952 ms ^C --- 172.16.1.99 ping statistics --- 7 packets transmitted, 3 received, 57.1429% packet loss, time 6144ms rtt min/avg/max/mdev = 2000.261/2968.182/3952.468/797.069 ms, pipe 4

Maybe ping is not the best tool for this other than detecting that there is is some latency there.

Have you tried iperf, mgen or even scp with say a 1M file from host to host, Im assuming your going to try pushing some "real" data thru this ?

Im not sure what you goals are but static ARP entry might smooth things out a little.

Joe

On 3/24/22 12:37, mszulcz-mitre wrote:

Hi jgiovatto,

Thank you for your quick reply! I've got a few follow-up comments:

*

Thanks for the suggestion about strace. I ran it with radio
delay=1 (when the ping RTTs grow) and with delay=0 (when they
don't). From the output piped to |grep "sendto"|, I didn't see any
clear differences between the 2 cases but maybe I missed
something. From the full output, I see that the following lines
for the case with delay=1 appear frequently:
/recvmsg(3, {msg_namelen=128}, MSG_DONTWAIT) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(3, {msg_namelen=128}, 0) = -1 EAGAIN (Resource temporarily
unavailable)/
I read a few webpages about these messages, and it seems like they
just indicate that a ping reply wasn't received.

*

To test the impact of ping timeouts, I ran a case in which the
timeout was much larger than the expected RTT. However, the RTTs
still grew:
***@***.***:~# ping -W 5 10.100.0.2
PING 10.100.0.2 (10.100.0.2) 56(84) bytes of data.
[…]
64 bytes from 10.100.0.2: icmp_seq=21 ttl=64 time=2005 ms
64 bytes from 10.100.0.2: icmp_seq=22 ttl=64 time=2002 ms
64 bytes from 10.100.0.2: icmp_seq=23 ttl=64 time=1998 ms
64 bytes from 10.100.0.2: icmp_seq=24 ttl=64 time=1991 ms
64 bytes from 10.100.0.2: icmp_seq=25 ttl=64 time=2992 ms
64 bytes from 10.100.0.2: icmp_seq=26 ttl=64 time=3994 ms
64 bytes from 10.100.0.2: icmp_seq=27 ttl=64 time=6929 ms
64 bytes from 10.100.0.2: icmp_seq=28 ttl=64 time=7929 ms
64 bytes from 10.100.0.2: icmp_seq=29 ttl=64 time=10851 ms
64 bytes from 10.100.0.2: icmp_seq=30 ttl=64 time=13853 ms/

*

In addition to timeouts, I tested the effect of changing the ping
interval time. For 1 set of tests, I set the interval time to 0.5
seconds. I saw that initially, the RTTs all grow by 0.5 sec:
***@***.***:~# ping -i 0.5 10.100.0.2
PING 10.100.0.2 (10.100.0.2) 56(84) bytes of data.
64 bytes from 10.100.0.2: icmp_seq=1 ttl=64 time=2002 ms
64 bytes from 10.100.0.2: icmp_seq=2 ttl=64 time=2503 ms
64 bytes from 10.100.0.2: icmp_seq=3 ttl=64 time=3003 ms
64 bytes from 10.100.0.2: icmp_seq=4 ttl=64 time=3504 ms
64 bytes from 10.100.0.2: icmp_seq=5 ttl=64 time=4005 ms
64 bytes from 10.100.0.2: icmp_seq=6 ttl=64 time=4505 ms
64 bytes from 10.100.0.2: icmp_seq=7 ttl=64 time=5005 ms
64 bytes from 10.100.0.2: icmp_seq=8 ttl=64 time=5506 ms
64 bytes from 10.100.0.2: icmp_seq=9 ttl=64 time=6007 ms
64 bytes from 10.100.0.2: icmp_seq=10 ttl=64 time=6506 ms
64 bytes from 10.100.0.2: icmp_seq=11 ttl=64 time=7008 ms
64 bytes from 10.100.0.2: icmp_seq=12 ttl=64 time=7507 ms
From 10.100.0.1 icmp_seq=22 Destination Host Unreachable
From 10.100.0.1 icmp_seq=23 Destination Host Unreachable
From 10.100.0.1 icmp_seq=24 Destination Host Unreachable
From 10.100.0.1 icmp_seq=25 Destination Host Unreachable
From 10.100.0.1 icmp_seq=26 Destination Host Unreachable
From 10.100.0.1 icmp_seq=27 Destination Host Unreachable
From 10.100.0.1 icmp_seq=28 Destination Host Unreachable
64 bytes from 10.100.0.2: icmp_seq=13 ttl=64 time=8008 ms
64 bytes from 10.100.0.2: icmp_seq=14 ttl=64 time=8508 ms
64 bytes from 10.100.0.2: icmp_seq=15 ttl=64 time=9009 ms/
This suggests that RF Pipe only starts counting for the 1 second
delay after the packet reaches the front of a FIFO queue. In this
example, at time 0, the 1st packet is added to the queue. It’s
held for 1 sec. and then sent. At time 0.5, the 2nd packet is
added to the queue. It doesn’t reach the front until the 1st
packet is sent 0.5 sec. later. Once at the front, it’s held in the
queue for 1 sec and then sent. As a result of waiting to reach the
front, it’s RTT grows by 0.5 sec. Is this the expected behavior
for RF Pipe? This pattern repeats a few times, but some point, it
stops and the RTTs increase unpredictably. I'm not sure why.

*

Similar to the timeout test, I ran a test in which the ping
interval was much larger than the delay. For this test, I set
delay=2 in LXC1 and delay=0 in LXC2, and set the ping interval to
4 seconds. However, ping times still grew unpredictably:
***@***.***:# emanesh node-1 set config 1 mac delay=2
nem 1 mac configuration updated
***@***.***:# emanesh node-2 set config 2 mac delay=0
nem 2 mac configuration updated
***@***.***:~# ping -i 4 10.100.0.2
PING 10.100.0.2 (10.100.0.2) 56(84) bytes of data.
64 bytes from 10.100.0.2: icmp_seq=1 ttl=64 time=2002 ms
64 bytes from 10.100.0.2: icmp_seq=2 ttl=64 time=2002 ms
64 bytes from 10.100.0.2: icmp_seq=3 ttl=64 time=6001 ms
64 bytes from 10.100.0.2: icmp_seq=4 ttl=64 time=4002 ms
64 bytes from 10.100.0.2: icmp_seq=5 ttl=64 time=2002 ms
64 bytes from 10.100.0.2: icmp_seq=6 ttl=64 time=3004 ms
64 bytes from 10.100.0.2: icmp_seq=7 ttl=64 time=3004 ms/

*

Finally, I'm not sure if it's a red herring, but running |tcpdump
-i emane0| on LXC2 shows that every time the ping RTT increases
significantly, an ARP request appears on the emane0 interface:
***@***.***:~# tcpdump -i emane0
16:00:28.685719 IP radio-1 > radio-2: ICMP echo request, id 766,
seq 6, length 64
16:00:28.685749 IP radio-2 > radio-1: ICMP echo reply, id 766, seq
6, length 64
16:00:28.690745 ARP, Request who-has radio-1 tell radio-2, length 28
16:00:29.686456 ARP, Request who-has radio-2 tell radio-1, length 28
16:00:29.686500 ARP, Reply radio-2 is-at 02:02:00:00:00:02 (oui
Unknown), length 28
16:00:30.686737 IP radio-1 > radio-2: ICMP echo request, id 766,
seq 7, length 64
16:00:30.686775 IP radio-2 > radio-1: ICMP echo reply, id 766, seq
7, length 64
16:00:31.687316 ARP, Reply radio-1 is-at 02:02:00:00:00:01 (oui
Unknown), length 28
16:00:32.688047 IP radio-1 > radio-2: ICMP echo request, id 766,
seq 8, length 64
16:00:32.688089 IP radio-2 > radio-1: ICMP echo reply, id 766, seq
8, length 64/

— Reply to this email directly, view it on GitHub https://github.com/adjacentlink/emane/issues/210#issuecomment-1077819491, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUAX5QZLPCDXEWT6OLGLNTVBSK5DANCNFSM5RPOYHLQ. You are receiving this because you commented.Message ID: @.***>