NLnetLabs / routinator

An RPKI Validator and RTR server written in Rust
https://nlnetlabs.nl/projects/routing/routinator/
BSD 3-Clause "New" or "Revised" License
454 stars 70 forks source link

Routinator resource usage #333

Open mxsasha opened 4 years ago

mxsasha commented 4 years ago

Based on this tweet I was asked to open a github issue about my routinator resource usage. This isn't an operational issue for me, the server can handle it, and I don't know whether there is an actual bug, but since I was asked to open an issue, here it is.

Here you can see when I started running routinator: After running for about 38 hours, the routinator server is at 1h18m of CPU time, so that seems to match those graphs. It's running on a single core virtual machine.

I installed it following the quick start exactly, I am running routinator server with those parameters. There are three BIRD instances using it for RPKI validation, IPv6 only.

Here are my full logs from the last start, nothing in there seems unusual.

partim commented 4 years ago

When you zoom in a bit, do you see short bursts of CPU usage? Typically, things should be a bit hectic for a minute or two and then calm for ten minutes. Given that each validation run includes validating signatures on around 100,000 objects, quite a bit of CPU usage is to be expected.

Similarly, these 100,000 objects – each is its own file –, are read every 10 minutes explaining the disk usage. I’d expect them to be mostly cached if the machine has enough memory, but if it is a VM on a busy host, ten minutes may be too long for them to stay in the cache.

mxsasha commented 4 years ago

Here's the closest zoom I can get, last 24 hours: Screenshot 2020-05-14 at 13 22 13

alarig commented 3 years ago

I have an operational issue with the RAM usage of the last release (0.9.0), it jumped from some megs to more than 1G. It’s OOM-killed by the kernel every now and them. As a quick fix, I’m back on 0.8.3. The upgrade has been done at the end of week 22. graphs

partim commented 3 years ago

Thanks for the report and the graphs! While we expected higher RAM usage due to the new database in 0.9, it certainly is too much now and consumption also seems to be growing over time. We are investigating both right now and hopefully will have a fix soon.

AlexanderBand commented 3 years ago

Please note that RAM usage in 0.10.0 is now significantly lower than in 0.9.0:

$ sudo systemctl status routinator
● routinator.service - Routinator 3000
   Loaded: loaded (/lib/systemd/system/routinator.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2021-08-23 12:22:18 UTC; 1 weeks 3 days ago
     Docs: man:routinator(1)
 Main PID: 7389 (routinator)
    Tasks: 5 (limit: 2330)
   Memory: 1.4G
   CGroup: /system.slice/routinator.service
           └─7389 /usr/bin/routinator --config=/etc/routinator/routinator.conf --syslog server

$ cat /proc/7389/status | grep 'VmHWM\|VmRSS'
VmHWM:    428368 kB
VmRSS:    369752 kB
alarig commented 3 years ago

Yeah, I have upgraded last week and I confirm. Thanks a lot for the work done. It even seems to be lower than 0.8.0 ;)

maxadamo commented 1 year ago

I am running the container version 0.12.0 in Nomad, and the CPU is spiking to unthinkable levels: 1500% (thousand and 5 hundred percent): image

or to 30 Ghz (thirty Ghz):

image

partim commented 1 year ago

It is a bit odd that this happens on every third validation run (assuming you’ve kept the refresh time at ten minutes), but otherwise not entirely unexpected if you have a lot of cores. Routinator uses a thread pool during validation that is by default configured to be the number of cores. Each thread processes one repository including updating the repository. If there is nothing to update and you have most of the files buffered (if you have enough memory), you can end up with all threads basically just validating signatures and using a lot of CPU. This would just be a short spike since most repositories are quite small and eventually there will be only two or three working threads left.

So, while seeing this every third run is certainly strange, I think this is normal. You can, however, limit the amount of threads (and thus cores used) via the validation-threads configuration option.