RIPE-NCC / ripe-atlas-software-probe

GNU General Public License v3.0
244 stars 61 forks source link

Interface for traffic statistics was not the same as interface for connectivity #32

Closed MikeV7896 closed 3 years ago

MikeV7896 commented 4 years ago

I set up a clean install of CentOS 8 last week and installed the package, then created the config.txt file to send the RX/TX data. The graph started showing data, but everything was 0.

I found out today that statistics for an unused virtual bridge (virbr0) were being sent rather than the interface that the network connection was on (enp0s20f0). While the virtual bridge did have an RFC1918 IP address in a different subnet, it certainly wasn't being used for any connectivity (I'm not certain, but I don't think it even had a gateway set).

My resolution was to remove gnome-boxes, a virtualization package that was installed as part of the "Server with GUI" setup. But maybe some kind of check on the interface being used for connectivity and the interface the RX/TX statistics are gathered from could be done to make sure they're the same.

PhilipHomburg commented 4 years ago

We have some plans to make it possible to select the interface manually. However we don't have a specific plan how or when to do that.

However, I'm curious why the automatic system failed. If you are willing to share the probe ID then I can take a look in our logs to see what is going on.

MikeV7896 commented 4 years ago

Probe ID is 1000495. I actually did another clean CentOS install Sunday night just after posting that, this time console only... so hopefully the logs go back far enough.

PhilipHomburg commented 4 years ago

I checked the log for your probe on May 6. The weird thing is that all traffic counters are zero except for the loopback interface (lo). In particular enp0s20f3, which I assume is the 'real' interface. I don't know what can cause that. I can take a look if that is unique to CentOS 8.

MikeV7896 commented 4 years ago

Actually, enp0s20f0 should be the active interface. enp0s20f1 through 3 are other physical interfaces that aren’t being used. The only other interface that was “UP” was virbr0, so I assumed that is what was being used, since TX/RX data started appearing after removing that interface.

PhilipHomburg commented 4 years ago

Ah, I see what is going on. The default route is on enp0s20f0 but the traffic stats are for enp0s20f3 and some virtual interfaces. I'll try to find out why enp0s20f0 is not reported.

PhilipHomburg commented 4 years ago

I see the problem. The code is never changed from hardware probes and only reports the first 4 interfaces (to avoid accidents). How many interfaces do you have in total?

MikeV7896 commented 4 years ago

Well, now I have lo and the four physical interfaces... before I also had the virtual bridge that was part of gnome-boxes.

———— Sent via Outlookhttps://aka.ms/qtex0l from my mobile device


From: PhilipHomburg notifications@github.com Sent: Tuesday, May 12, 2020 7:33:45 AM To: RIPE-NCC/ripe-atlas-software-probe ripe-atlas-software-probe@noreply.github.com Cc: Michael Virgilio mikev@mikev.com; Author author@noreply.github.com Subject: Re: [RIPE-NCC/ripe-atlas-software-probe] Interface for traffic statistics was not the same as interface for connectivity (#32)

I see the problem. The code is never changed from hardware probes and only reports the first 4 interfaces (to avoid accidents). How many interfaces do you have in total?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/32#issuecomment-627285496, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFJCWCVNZA6HM7GSRA5L3RTRREXZTANCNFSM4M5N4HCA.

chriselsen commented 4 years ago

Running into the same issue: The probe doesn't show traffic stats on the "general" tab, but connects without issues to the controller and also has a couple of UDMs running. But in my case I only have eth0 and lo. Therefore I wouldn't expect any issue. Probe ID is: 1000566

PhilipHomburg commented 4 years ago

Probe 1000566 doesn't seem to submit traffic statistics. See the README on how to enable them.

chriselsen commented 4 years ago

I'm still experiencing weird behavior here: I enabled traffic stats according to the README on another probe (ID: 1000597). It worked for a few days, but then started showing 0 constantly.

PhilipHomburg commented 4 years ago

I'm still experiencing weird behavior here: I enabled traffic stats according to the README on another probe (ID: 1000597). It worked for a few days, but then started showing 0 constantly.

The current code submits statics for at most 4 interfaces. I have a patch to change that, but I don't know yet when it will be released

chriselsen commented 4 years ago

The current code submits statics for at most 4 interfaces.

Keep in mind that the probe (ID: 1000597) mentioned above only has 2x interfaces. One of them being lo. Another probe (ID: 1000612) that is setup pretty much the same way has no issues whatsoever.

PhilipHomburg commented 4 years ago

Probe 1000597 reports statistics for gretap0, gre0, sit0, and lo.

chriselsen commented 4 years ago

Probe 1000597 reports statistics for gretap0, gre0, sit0, and lo.

That is interesting and weird and is probably worth another look. Probe 1000597 runs inside a docker container and should therefore only see eth0 and lo. Can I see within the probe's filesystem which interfaces it sees?

PhilipHomburg commented 4 years ago

The probe tries to report the contents of /proc/net/dev.

chriselsen commented 4 years ago

The probe tries to report the contents of /proc/net/dev.

That helps a lot and brings me a step closer: As mentioned probe 1000597 and probe 1000612 were setup the same way and both run inside a docker container. For some reason looking at /proc/net/dev, I can see the interface ordered as gretap0, gre0, sit0, lo, erspan0, eth0. On the 1000612 probe it's eth0, lo, erspan0, gretap0, sit0, gre0. For probe 1000597 I mode the container to a different docker bridge once. Guess that recreates eth0. Let me see how to disable gretap0, gre0, sit0, and erspan0 in that image.

FlohEinstein commented 4 years ago

Found somewhere that you can influence the order of /proc/net/dev by changing the order of how the modules are loaded in /etc/modules. I tried that, and it really had an influence, but the interface I need the probe to report about (pppoe) is still really far down in the file, below all veth, lan, br and so on. So please @PhilipHomburg , bring on your patch, so I don't spend days to patch the OS to reorder the interfaces :-D

PhilipHomburg commented 4 years ago

I just pushed 5020-2 to the devel branch. It links to a newer version of the measurement code which should fix the traffic statistics issue

FlohEinstein commented 4 years ago

Great, thank you! I just installed it, let's see if I get any stats. No change in /etc/config/atlas needed I presume? According to syslog, it's taking a closer look at a veth-interface :-/ My probe is 1000287

FlohEinstein commented 4 years ago

OK, I checked @PhilipHomburg s patch - it can't fix the problem since rxtxrpt.c is part of busybox and still only returns the values for the first four lines of /proc/net/dev. So I did something nasty: /usr/libexec/atlas-probe/bin/rxtxrpt is a soft link to ./busybox I replaced it with the following shell script:

`#!/bin/ash busyboxrxtxrpt=#/usr/libexec/atlas-probe/bin/busybox rxtxrpt -A 9002 | sed 's/interfaces\"\: [ .*/interfaces\": /g'# file="/proc/net/dev"

result=$busyboxrxtxrpt while read -r infname bytes_recv pkt_recv errors_recv dropped_recv fifo_recv framing_recv compressed_recv multicast_recv bytes_sent pkt_sent errors_sent dropped_sent fifo_sent collisions_sent carr_lost_sent compressed_sent do infname=#echo $infname | sed "s/://g"# if [ "$infname" == "pppoe-wan" ]; then interfaces=" { \"name\": \"$infname\", \"bytes_recv\": $bytes_recv, \"pkt_recv\": $pkt_recv, \"errors_recv\": $errors_recv, \"dropped_recv\": $dropped_recv, \"fifo_recv\": $fifo_recv, \"framing_recv\": $framing_recv, \"compressed_recv\": $compressed_recv, \"multicast_recv\": $multicast_recv, \"bytes_sent\": $bytes_sent, \"pkt_sent\": $pkt_sent, \"errors_sent\": $errors_sent, \"dropped_sent\": $dropped_sent, \"fifo_sent\": $fifo_sent, \"collisions_sent\": $collisions_sent, \"carr_lost_sent\": $carr_lost_sent, \"compressed_sent\": $compressed_sent }" fi

if [ "$infname" == "anotherinterface" ]; then

    # interfaces="$interfaces, { \"name\": \"$infname\", \"bytes_recv\": $bytes_recv, \"pkt_recv\": $pkt_recv, \"errors_recv\": $errors_recv, \"dropped_recv\": $dropped_recv, \"fifo_recv\": $fifo_recv, \"framing_recv\": $framing_recv, \"compressed_recv\": $compressed_recv, \"multicast_recv\": $multicast_recv, \"bytes_sent\": $bytes_sent, \"pkt_sent\": $pkt_sent, \"errors_sent\": $errors_sent, \"dropped_sent\": $dropped_sent, \"fifo_sent\": $fifo_sent, \"collisions_sent\": $collisions_sent, \"carr_lost_sent\": $carr_lost_sent, \"compressed_sent\": $compressed_sent }"
# fi
# if [ "$infname" == "thethirdinterface" ]; then 
    # interfaces="$interfaces, { \"name\": \"$infname\", \"bytes_recv\": $bytes_recv, \"pkt_recv\": $pkt_recv, \"errors_recv\": $errors_recv, \"dropped_recv\": $dropped_recv, \"fifo_recv\": $fifo_recv, \"framing_recv\": $framing_recv, \"compressed_recv\": $compressed_recv, \"multicast_recv\": $multicast_recv, \"bytes_sent\": $bytes_sent, \"pkt_sent\": $pkt_sent, \"errors_sent\": $errors_sent, \"dropped_sent\": $dropped_sent, \"fifo_sent\": $fifo_sent, \"collisions_sent\": $collisions_sent, \"carr_lost_sent\": $carr_lost_sent, \"compressed_sent\": $compressed_sent }"
# fi
# if [ "$infname" == "theforthinterface" ]; then 
    # interfaces="$interfaces, { \"name\": \"$infname\", \"bytes_recv\": $bytes_recv, \"pkt_recv\": $pkt_recv, \"errors_recv\": $errors_recv, \"dropped_recv\": $dropped_recv, \"fifo_recv\": $fifo_recv, \"framing_recv\": $framing_recv, \"compressed_recv\": $compressed_recv, \"multicast_recv\": $multicast_recv, \"bytes_sent\": $bytes_sent, \"pkt_sent\": $pkt_sent, \"errors_sent\": $errors_sent, \"dropped_sent\": $dropped_sent, \"fifo_sent\": $fifo_sent, \"collisions_sent\": $collisions_sent, \"carr_lost_sent\": $carr_lost_sent, \"compressed_sent\": $compressed_sent }"
# fi
# echo $infname

done < $file result="$result [ $interfaces ] }" echo $result ` (shouldn't copypaste code here, replace ## with a backwards-apostrophe)

And guess what: It works. Yes, I know, it's a thereifixedit-trustmeimanengineer-solution, but my C is too rusty to make the changes on busybox.

PhilipHomburg commented 4 years ago

Probe 1000287 runs version 2.2.0 of the 'busybox' code. The fixed version in the devel branch is 2.2.2.

fiergna commented 3 years ago

i have the same issue.

i run the probe inside a docker on unraid. using this docker container https://github.com/Jamesits/docker-ripe-atlas RXTXRTP is set to yes

my /proc/net/dev shows: Io, tun10, gre0, gretap0, erspan0, ip_vti0, sit0, eth0. eth0 is at the bottom and there is traffic going on.

probe id: 1001653

PhilipHomburg commented 3 years ago

The version in master now references a busybox version that has the increased number of interfaces.