RIPE-NCC / ripe-atlas-software-probe

GNU General Public License v3.0
252 stars 61 forks source link

CPU #81

Open katzeprior opened 1 year ago

katzeprior commented 1 year ago

I installed the ripe-atlas-software-probe but something uses 100% of the cpu for bursts, is this a known issue and is there a fix for it? Seems to be a busybox rptaddrs thing.

Screenshot_20230308_221549 Screenshot_20230308_222120

michel-stam commented 1 year ago

Hello @katzeprior ,

This is not typical behaviour. Can you share what your installation is like? Did you compile it yourself, did you use one of the binary packages, any modifications/configuration?

Regards,

Michel

katzeprior commented 1 year ago

Hey @michel-stam ,

I've used https://github.com/RIPE-NCC/ripe-atlas-software-probe/blob/master/INSTALL.rst#to-create-a-deb-for-debian-or-debian-based-distros and installed it on a 1 core 1gb ram vps. I've installed it on a pi 3b and that worked like intended.

michel-stam commented 1 year ago

Hey @katzeprior ,

Can you run a ps axuw and tell me which process is pulling 100% CPU time? Maybe also look at /var/log/messages, /var/log/syslog etc.

Regards,

Michel

averzicco commented 1 year ago

I'm also experiencing the same issue.

The process using 100% CPU is rptaddrs:

 /usr/local/atlas/bb-13.3/bin/rptaddrs -A 9104 -c /var/atlas-probe/data/new/v6addr.vol -O /var/atlas-probe/data/new/v6addr.txt

The probe is running in a VM with 1 core/1gb RAM

michel-stam commented 1 year ago

Hi Allesandro

Would you be able to attach a strace output?

strace -ff -p -s 1500 >& out.log

Maybe I can derive where the system is stuck based on this.

Regards,

Michel

On 9 Jul 2023, at 15:30, Alessandro Verzicco @.***> wrote:

I'm also experiencing the same issue.

The process using 100% CPU is rptaddrs:

/usr/local/atlas/bb-13.3/bin/rptaddrs -A 9104 -c /var/atlas-probe/data/new/v6addr.vol -O /var/atlas-probe/data/new/v6addr.txt The probe is running in a VM with 1 core/1gb RAM

— Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/81#issuecomment-1627716714, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXSOYX4E4RIPJZK37I6EJV3XPKW57ANCNFSM6AAAAAAVUJBHZY. You are receiving this because you were mentioned.

michel-stam commented 1 year ago

Also, @averzicco

Can you tell me what platform you're running the probe on? (OS/version, etc).

Regards,

Michel

averzicco commented 1 year ago

sure, platform: Linux 213327-f4 5.10.0-22-amd64 #1 SMP Debian 5.10.178-3 (2023-04-22) x86_64 GNU/Linux debian 11.7

I think I figured out what is causing it, the v6addr.vol file is about 26 MB and rptaddrs is taking a while to process them, it's not stuck. I've the suspect that v6addr.vol size is proportional to the number of routes in the routing table and since this VPS is connected via BGP to a transit provider with the full ipv6 routing table the number of routes is quite high:

ip -6 route | wc -l
182916

Is there a workaround to reduce frequency for the execution of the rptaddrs process? or maybe reduce the data it needs to process?

katzeprior commented 1 year ago

@averzicco You are probably right, for my software probe on a pi at home it wasn't an issue, but with my BGP vps (full v6 route) it also hogged system.

michel-stam commented 1 year ago

Hello Alessandro,

Interesting. I will have to discuss it internally. Not sure that you’d be wanting a BGP router to mirror its routes into the Atlas backend, we have RIS for that kind of feeds :)

What is the use case for having a BGP router double as probe?

To answer your question, there’s no workaround for this as we did not expect BGP routers being probes as well.

I’ll get back when I have an update.

Cheers,

Michel

On 10 Jul 2023, at 19:40, Alessandro Verzicco @.***> wrote:

sure, platform: Linux 213327-f4 5.10.0-22-amd64 #1 SMP Debian 5.10.178-3 (2023-04-22) x86_64 GNU/Linux debian 11.7

I think I figured out what is causing it, the v6addr.vol file is about 26 MB and rptaddrs is taking a while to process them, it's not stuck. I've the suspect that v6addr.vol size is proportional to the number of routes in the routing table and since this VPS is connected via BGP to a transit provider with the full ipv6 routing table the number of routes is quite high:

ip -6 route | wc -l 182916 Is there a workaround to reduce frequency for the execution of the rptaddrs process? or maybe reduce the data it needs to process?

— Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/81#issuecomment-1629421487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXSOYX6QXDUNTPJ7KRVCTS3XPQ5APANCNFSM6AAAAAAVUJBHZY. You are receiving this because you were mentioned.

katzeprior commented 1 year ago

What is the use case for having a BGP router double as probe?

Some people use BGP on a small VPS so it isn't a router it is a router and server in one.

averzicco commented 1 year ago

Not sure that you’d be wanting a BGP router to mirror its routes into the Atlas backend, we have RIS for that kind of feeds :)

My goal is not to mirror my BGP router routes to the Atlas backend and I wasn't aware that the probe mirrors the routes on the Atlas backend.

My use case is to host the Atlas probe in my network and the BGP router is part of it

I understand that the software probe wasn't designed to run on BGP routers. I'll try to find a workaround otherwise I'll host it somewhere else.

michel-stam commented 1 year ago

Hi Alessandro,

This is a workaround, but at the least bird is able to put information into different routing tables, which correspond to different kernel tables. This may be an option.

The intention of this behaviour is to collect debugging information in case there are problems with the probe connecting to the backend. However, I have no idea if the system actually uses this. This is why I need to talk to some of my colleagues, the history of this change predates my joining the RIPE NCC.

Based on that we could decide to amend or remove this behaviour.

Bear with me :)

Regards,

Michel

On 12 Jul 2023, at 17:15, Alessandro Verzicco @.***> wrote:

Not sure that you’d be wanting a BGP router to mirror its routes into the Atlas backend, we have RIS for that kind of feeds :)

My goal is not to mirror my BGP router routes to the Atlas backend and I wasn't aware that the probe mirrors the routes on the Atlas backend.

My use case is to host the Atlas probe in my network and the BGP router is part of it

I understand that the software probe wasn't designed to run on BGP routers. I'll try to find a workaround otherwise I'll host it somewhere else.

— Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/81#issuecomment-1632732270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXSOYX6FSKY7O7IQJP55H4DXP25RHANCNFSM6AAAAAAVUJBHZY. You are receiving this because you were mentioned.

averzicco commented 1 year ago

For now as workaround I've created a network namespace and configured the systemd service unit to use that namespace, this seem to be good enough to expose only a single route to the atlast probe.

image

michel-stam commented 1 year ago

Perfect!

Good work, can you share the change you made?

I’ll discuss internally because if more people do this it would generate data which I’m not sure has any benefit.

Cheers,

Michel

On 12 Jul 2023, at 23:25, Alessandro Verzicco @.***> wrote:

For now as workaround I've created a network namespace and configured the systemd service unit to use that namespace, this seem to be good enough to expose only a single route to the atlast probe.

https://user-images.githubusercontent.com/8068317/253104448-d58ea7cd-e441-4000-92ef-adbe93268140.png — Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/81#issuecomment-1633233581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXSOYX47A536D2ZI3ZGZ4GLXP4I5VANCNFSM6AAAAAAVUJBHZY. You are receiving this because you were mentioned.

averzicco commented 1 year ago

sure, basically:

for example if the ipv6 prefix in the BGP announcement is 11:22:33::1/48:

ip netns add atlas
ip netns exec atlas ip link set dev lo up
ip link add veth0 type veth peer name v-atlas
ip link set v-atlas netns atlas
ip netns exec atlas ip a
ip a a dev veth0 11:22:33:100::1/56
ip link set veth0 up
ip netns exec atlas ip a a dev v-atlas 11:22:33:100::2/56
ip netns exec atlas ip route add default via 11:22:33:100::1

sysctl -w net.ipv6.conf.all.forwarding=1

in /etc/systemd/system/atlas.service

[Service]
NetworkNamespacePath=/run/netns/atlas
BindReadOnlyPaths=/etc/resolv.conf:/etc/resolv.conf:norbind
michel-stam commented 1 year ago

Thanks Alessandro,

Good find :)

Cheers,

Michel

On 13 Jul 2023, at 19:23, Alessandro Verzicco @.***> wrote:

sure, basically:

add a network namespace add a veth interface peered to an interface in the network namespace add IPs and default route to the interfaces created enable ipv6 forwarding configure systemd service to use that namespace for example if the ipv6 prefix in the BGP announcement is 11:22:33::1/48:

ip netns add atlas ip netns exec atlas ip link set dev lo up ip link add veth0 type veth peer name v-atlas ip link set v-atlas netns atlas ip netns exec atlas ip a ip a a dev veth0 11:22:33:100::1/56 ip link set veth0 up ip netns exec atlas ip a a dev v-atlas 11:22:33:100::2/56 ip netns exec atlas ip route add default via 11:22:33:100::1

sysctl -w net.ipv6.conf.all.forwarding=1 in /etc/systemd/system/atlas.service

[Service] NetworkNamespacePath=/run/netns/atlas BindReadOnlyPaths=/etc/resolv.conf:/etc/resolv.conf:norbind — Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/81#issuecomment-1634625851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXSOYX4WHC2Y6CBSYVACH6LXQAVK7ANCNFSM6AAAAAAVUJBHZY. You are receiving this because you were mentioned.

michel-stam commented 1 year ago

Hi @averzicco

I've made an internal ticket out of it, where I'll look at the default route, which is practically the only route being looked at by the backend. That and possibly interface routes.

Until then your workaround is probably the best approach.

Will keep you posted.

Cheers,

Michel

CreeperFace00 commented 1 month ago

I'm currently experiencing the exact same problem. However mine is 134MB and seems to also contain the entire global IPv4 routing table as well as v6.