FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.34k stars 1.25k forks source link

Show bgp ipv4 json detail command increases the Linux VM size dramatically #16643

Open pguibert6WIND opened 2 months ago

pguibert6WIND commented 2 months ago

Description

Under a linux device that received a 900K prefixes full route, if I dump the detailed json output on a file, I can see a dramatic increase in the virtual memory size used.

root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root        7905  9.2 10.2 1224200 1044208 ?     Ssl  18:20   1:46 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp
root@dut-sureau-nianticvf:~# vtysh -c "show bgp ipv4 json detail" > /tmp/showbgpipv4detailjson.txt
root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root        7905 11.7 24.1 2638212 2457428 ?     Ssl  18:20   2:22 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp
dut-sureau-nianticvf(config)# debu bgp memory dump-show-bgp-route
number of gc occurence for 'show bgp route': 100u

Virtual Memory size went from 1224200 to 2638212 KB Resident Memory size went from 1044208 to 2457428 KB

Version

10.0
I think problem happens with all routes.

How to reproduce

get a full route setup, wait for stabilisation in the ZEBRA RIB. Then request bgpd with above command.

Expected behavior

I dont expect a memory increase in VM size

Actual behavior

dramatic increase of VM size

Additional context

This is a full route extract with router peering with a single device. However, in a real ISP scenario, multiple peering may happen. Increasing the number of peers increases the memory used.

Checklist

mjstapp commented 2 months ago

will the work in the open PR about memory footprint in vtysh show commands ( #16498 ) help with this - have you tried that diff in this scenario?

pguibert6WIND commented 2 months ago

will the work in the open PR about memory footprint in vtysh show commands ( #16498 ) help with this - have you tried that diff in this scenario?

The result is slightly better, but is not zero effort. We still have virtual memory going from 1224212 KB to 2137220 GB. We still have residential memory going from 1044440 KB to 1959064 GB.

root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root       10874 54.9 10.2 1224212 1044440 ?     Ssl  08:25   1:42 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp
root@dut-sureau-nianticvf:~# time vtysh -c "show bgp ipv4 json detail" > /tmp/showbgpipv4detailjson.txt       

real    0m30.286s
user    0m2.796s
sys     0m5.565s
root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root        7702 75.7 19.2 2137220 1959064 ?     Ssl  08:14   2:23 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp

My fear is still memory fragmentation,

ton31337 commented 2 months ago

what does leak sanitizer say when running that command?

hawicz commented 2 months ago

Assuming I'm looking at the right code, I'm guessing that command emits a json that looks a bit like:

{"vrfs": {   "<vrfname>": {
    "protocols": {
        "<zebra_route_string_i>": "<NHT_RM_NAME>",
        ...
    }
   },
   ...
}

If there are 900k of those, with each having a 10 bytes key and value, you should expect roughly 92MB for in-memory object storage, and definitely no more than that to serialize the object to a string.

If you find that your use of the json-c library uses much more than that, open a new issue over in the json-c project (i.e. please don't just piggy-back on json-c/json-c#552)

pguibert6WIND commented 2 months ago

Assuming I'm looking at the right code, I'm guessing that command emits a json that looks a bit like:

{"vrfs": {   "<vrfname>": {
  "protocols": {
      "<zebra_route_string_i>": "<NHT_RM_NAME>",
      ...
  }
   },
   ...
}
* json-c base json_object size: 40 bytes

* json_object_object: 48 (base object + hash table ptr) + 56 (hash table) + #entries * (40 + avg key size + avg entry object size)

* each entry in the `vrfs.<vrfname>.protocols` object will be a json_object_string: 48 bytes + length of string

If there are 900k of those, with each having a 10 bytes key and value, you should expect roughly 92MB for in-memory object storage, and definitely no more than that to serialize the object to a string.

If you find that your use of the json-c library uses much more than that, open a new issue over in the json-c project (i.e. please don't just piggy-back on json-c/json-c#552)

Hi Eric, thanks for the quick update.

As example, please find an extract of what the output looks like. The below route entry represents one of the 993276 entries present.

{
 "vrfId": 0,
 "vrfName": "default",
 "tableVersion": 993276,
 "routerId": "165.16.221.64",
 "defaultLocPrf": 100,
 "localAS": 65500,
 "routes": {
        "0.0.0.0/0":{
                "prefix": "0.0.0.0/0",
                "version": "1",
                "advertisedTo":{
                    "165.16.221.65":{
                    "hostname":"dut2-sureau-nianticvf"
                    }
                },
                "paths":[{
                    "aspath":{
                        "string":"37721 3257",
                        "segments":[{
                            "type":"as-sequence",
                            "list":[37721,3257]
                        }],
                        "length":2
                    },
                    "origin":"IGP","valid":true,"version":1,
                    "bestpath":{
                        "overall":true,
                        "selectionReason":"First path received"
                    },
                    "community":{
                        "string":"37721:4000 37721:4006 37721:4200 37721:4230",
                        "list":[
                            "37721:4000","37721:4006","37721:4200","37721:4230"
                        ]},
                    "lastUpdate":{
                        "epoch":1724653537,"string":"Mon Aug 26 08:25:37 2024\n"
                    },
                    "nexthops":[{
                        "ip":"165.16.221.66","hostname":"dut2-sureau-nianticvf","afi":"ipv4","metric":0,
                        "accessible":true,"used":true
                    }],
                    "peer":{
                        "peerId":"165.16.221.65",
                        "routerId":"165.16.221.65","hostname":"dut2-sureau-nianticvf","type":"external"
                    }
                }]
        },

The whole file is https://drive.google.com/file/d/1NnXSUX_wuKN2Zcu8r1b8jkjbg63kG8Jx/view?usp=sharing Basically, it is a list of paths with many different options for each time.

Thanks also for the numbers provided. The json functionality itself works very well. I have been a bit clumsy by addressing a comment directly on the json repository, and I apologise for that. My guess is that the memory management is a problem on Linux, and that limiting the memory usage by all means can help reduce the memory footprint.

I do some experiments on memory management:

This is based on the last experiment that I need some help on the json APIs available to build such cases.

pguibert6WIND commented 2 months ago

As additional test done, without changing the json model, I could see that the vty_json_no_pretty() function takes a lot of memory.

text = json_object_to_json_string_ext()
json_object_free(json);

if the call is not done, the virtual memory size is far better. Virtual Memory size increased from 1663864 KB to 1704664 KB instead of 2043920 KB.

root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root       59828 28.3 14.5 1663864 1485696 ?     Ssl  12:02   1:47 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp

root@dut-sureau-nianticvf:~# time vtysh -c "show bgp ipv4 json detail" > /tmp/showbgpipv4detailjson.txt

real    0m24.767s
user    0m1.152s
sys     0m1.152s

root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root       59828 31.6 14.9 1704664 1526640 ?     Ssl  12:02   2:10 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp

Finding out how to optimize the display could help resolve this spike in VM size.