Open katzeprior opened 1 year ago
Hello @katzeprior ,
This is not typical behaviour. Can you share what your installation is like? Did you compile it yourself, did you use one of the binary packages, any modifications/configuration?
Regards,
Michel
Hey @michel-stam ,
I've used https://github.com/RIPE-NCC/ripe-atlas-software-probe/blob/master/INSTALL.rst#to-create-a-deb-for-debian-or-debian-based-distros and installed it on a 1 core 1gb ram vps. I've installed it on a pi 3b and that worked like intended.
Hey @katzeprior ,
Can you run a ps axuw and tell me which process is pulling 100% CPU time? Maybe also look at /var/log/messages, /var/log/syslog etc.
Regards,
Michel
I'm also experiencing the same issue.
The process using 100% CPU is rptaddrs
:
/usr/local/atlas/bb-13.3/bin/rptaddrs -A 9104 -c /var/atlas-probe/data/new/v6addr.vol -O /var/atlas-probe/data/new/v6addr.txt
The probe is running in a VM with 1 core/1gb RAM
Hi Allesandro
Would you be able to attach a strace output?
strace -ff -p
Maybe I can derive where the system is stuck based on this.
Regards,
Michel
On 9 Jul 2023, at 15:30, Alessandro Verzicco @.***> wrote:
I'm also experiencing the same issue.
The process using 100% CPU is rptaddrs:
/usr/local/atlas/bb-13.3/bin/rptaddrs -A 9104 -c /var/atlas-probe/data/new/v6addr.vol -O /var/atlas-probe/data/new/v6addr.txt The probe is running in a VM with 1 core/1gb RAM
— Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/81#issuecomment-1627716714, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXSOYX4E4RIPJZK37I6EJV3XPKW57ANCNFSM6AAAAAAVUJBHZY. You are receiving this because you were mentioned.
Also, @averzicco
Can you tell me what platform you're running the probe on? (OS/version, etc).
Regards,
Michel
sure, platform:
Linux 213327-f4 5.10.0-22-amd64 #1 SMP Debian 5.10.178-3 (2023-04-22) x86_64 GNU/Linux
debian 11.7
I think I figured out what is causing it, the v6addr.vol
file is about 26 MB and rptaddrs
is taking a while to process them, it's not stuck. I've the suspect that v6addr.vol
size is proportional to the number of routes in the routing table and since this VPS is connected via BGP to a transit provider with the full ipv6 routing table the number of routes is quite high:
ip -6 route | wc -l
182916
Is there a workaround to reduce frequency for the execution of the rptaddrs
process? or maybe reduce the data it needs to process?
@averzicco You are probably right, for my software probe on a pi at home it wasn't an issue, but with my BGP vps (full v6 route) it also hogged system.
Hello Alessandro,
Interesting. I will have to discuss it internally. Not sure that you’d be wanting a BGP router to mirror its routes into the Atlas backend, we have RIS for that kind of feeds :)
What is the use case for having a BGP router double as probe?
To answer your question, there’s no workaround for this as we did not expect BGP routers being probes as well.
I’ll get back when I have an update.
Cheers,
Michel
On 10 Jul 2023, at 19:40, Alessandro Verzicco @.***> wrote:
sure, platform: Linux 213327-f4 5.10.0-22-amd64 #1 SMP Debian 5.10.178-3 (2023-04-22) x86_64 GNU/Linux debian 11.7
I think I figured out what is causing it, the v6addr.vol file is about 26 MB and rptaddrs is taking a while to process them, it's not stuck. I've the suspect that v6addr.vol size is proportional to the number of routes in the routing table and since this VPS is connected via BGP to a transit provider with the full ipv6 routing table the number of routes is quite high:
ip -6 route | wc -l 182916 Is there a workaround to reduce frequency for the execution of the rptaddrs process? or maybe reduce the data it needs to process?
— Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/81#issuecomment-1629421487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXSOYX6QXDUNTPJ7KRVCTS3XPQ5APANCNFSM6AAAAAAVUJBHZY. You are receiving this because you were mentioned.
What is the use case for having a BGP router double as probe?
Some people use BGP on a small VPS so it isn't a router it is a router and server in one.
Not sure that you’d be wanting a BGP router to mirror its routes into the Atlas backend, we have RIS for that kind of feeds :)
My goal is not to mirror my BGP router routes to the Atlas backend and I wasn't aware that the probe mirrors the routes on the Atlas backend.
My use case is to host the Atlas probe in my network and the BGP router is part of it
I understand that the software probe wasn't designed to run on BGP routers. I'll try to find a workaround otherwise I'll host it somewhere else.
Hi Alessandro,
This is a workaround, but at the least bird is able to put information into different routing tables, which correspond to different kernel tables. This may be an option.
The intention of this behaviour is to collect debugging information in case there are problems with the probe connecting to the backend. However, I have no idea if the system actually uses this. This is why I need to talk to some of my colleagues, the history of this change predates my joining the RIPE NCC.
Based on that we could decide to amend or remove this behaviour.
Bear with me :)
Regards,
Michel
On 12 Jul 2023, at 17:15, Alessandro Verzicco @.***> wrote:
Not sure that you’d be wanting a BGP router to mirror its routes into the Atlas backend, we have RIS for that kind of feeds :)
My goal is not to mirror my BGP router routes to the Atlas backend and I wasn't aware that the probe mirrors the routes on the Atlas backend.
My use case is to host the Atlas probe in my network and the BGP router is part of it
I understand that the software probe wasn't designed to run on BGP routers. I'll try to find a workaround otherwise I'll host it somewhere else.
— Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/81#issuecomment-1632732270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXSOYX6FSKY7O7IQJP55H4DXP25RHANCNFSM6AAAAAAVUJBHZY. You are receiving this because you were mentioned.
For now as workaround I've created a network namespace and configured the systemd service unit to use that namespace, this seem to be good enough to expose only a single route to the atlast probe.
Perfect!
Good work, can you share the change you made?
I’ll discuss internally because if more people do this it would generate data which I’m not sure has any benefit.
Cheers,
Michel
On 12 Jul 2023, at 23:25, Alessandro Verzicco @.***> wrote:
For now as workaround I've created a network namespace and configured the systemd service unit to use that namespace, this seem to be good enough to expose only a single route to the atlast probe.
https://user-images.githubusercontent.com/8068317/253104448-d58ea7cd-e441-4000-92ef-adbe93268140.png — Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/81#issuecomment-1633233581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXSOYX47A536D2ZI3ZGZ4GLXP4I5VANCNFSM6AAAAAAVUJBHZY. You are receiving this because you were mentioned.
sure, basically:
for example if the ipv6 prefix in the BGP announcement is 11:22:33::1/48
:
ip netns add atlas
ip netns exec atlas ip link set dev lo up
ip link add veth0 type veth peer name v-atlas
ip link set v-atlas netns atlas
ip netns exec atlas ip a
ip a a dev veth0 11:22:33:100::1/56
ip link set veth0 up
ip netns exec atlas ip a a dev v-atlas 11:22:33:100::2/56
ip netns exec atlas ip route add default via 11:22:33:100::1
sysctl -w net.ipv6.conf.all.forwarding=1
in /etc/systemd/system/atlas.service
[Service]
NetworkNamespacePath=/run/netns/atlas
BindReadOnlyPaths=/etc/resolv.conf:/etc/resolv.conf:norbind
Thanks Alessandro,
Good find :)
Cheers,
Michel
On 13 Jul 2023, at 19:23, Alessandro Verzicco @.***> wrote:
sure, basically:
add a network namespace add a veth interface peered to an interface in the network namespace add IPs and default route to the interfaces created enable ipv6 forwarding configure systemd service to use that namespace for example if the ipv6 prefix in the BGP announcement is 11:22:33::1/48:
ip netns add atlas ip netns exec atlas ip link set dev lo up ip link add veth0 type veth peer name v-atlas ip link set v-atlas netns atlas ip netns exec atlas ip a ip a a dev veth0 11:22:33:100::1/56 ip link set veth0 up ip netns exec atlas ip a a dev v-atlas 11:22:33:100::2/56 ip netns exec atlas ip route add default via 11:22:33:100::1
sysctl -w net.ipv6.conf.all.forwarding=1 in /etc/systemd/system/atlas.service
[Service] NetworkNamespacePath=/run/netns/atlas BindReadOnlyPaths=/etc/resolv.conf:/etc/resolv.conf:norbind — Reply to this email directly, view it on GitHub https://github.com/RIPE-NCC/ripe-atlas-software-probe/issues/81#issuecomment-1634625851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXSOYX4WHC2Y6CBSYVACH6LXQAVK7ANCNFSM6AAAAAAVUJBHZY. You are receiving this because you were mentioned.
Hi @averzicco
I've made an internal ticket out of it, where I'll look at the default route, which is practically the only route being looked at by the backend. That and possibly interface routes.
Until then your workaround is probably the best approach.
Will keep you posted.
Cheers,
Michel
I'm currently experiencing the exact same problem. However mine is 134MB and seems to also contain the entire global IPv4 routing table as well as v6.
I installed the ripe-atlas-software-probe but something uses 100% of the cpu for bursts, is this a known issue and is there a fix for it? Seems to be a busybox rptaddrs thing.