Open pguibert6WIND opened 2 months ago
will the work in the open PR about memory footprint in vtysh show commands ( #16498 ) help with this - have you tried that diff in this scenario?
will the work in the open PR about memory footprint in vtysh show commands ( #16498 ) help with this - have you tried that diff in this scenario?
The result is slightly better, but is not zero effort. We still have virtual memory going from 1224212 KB to 2137220 GB. We still have residential memory going from 1044440 KB to 1959064 GB.
root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root 10874 54.9 10.2 1224212 1044440 ? Ssl 08:25 1:42 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp
root@dut-sureau-nianticvf:~# time vtysh -c "show bgp ipv4 json detail" > /tmp/showbgpipv4detailjson.txt
real 0m30.286s
user 0m2.796s
sys 0m5.565s
root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root 7702 75.7 19.2 2137220 1959064 ? Ssl 08:14 2:23 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp
My fear is still memory fragmentation,
what does leak sanitizer say when running that command?
Assuming I'm looking at the right code, I'm guessing that command emits a json that looks a bit like:
{"vrfs": { "<vrfname>": {
"protocols": {
"<zebra_route_string_i>": "<NHT_RM_NAME>",
...
}
},
...
}
vrfs.<vrfname>.protocols
object will be a json_object_string: 48 bytes + length of stringIf there are 900k of those, with each having a 10 bytes key and value, you should expect roughly 92MB for in-memory object storage, and definitely no more than that to serialize the object to a string.
If you find that your use of the json-c library uses much more than that, open a new issue over in the json-c project (i.e. please don't just piggy-back on json-c/json-c#552)
Assuming I'm looking at the right code, I'm guessing that command emits a json that looks a bit like:
{"vrfs": { "<vrfname>": { "protocols": { "<zebra_route_string_i>": "<NHT_RM_NAME>", ... } }, ... }
* json-c base json_object size: 40 bytes * json_object_object: 48 (base object + hash table ptr) + 56 (hash table) + #entries * (40 + avg key size + avg entry object size) * each entry in the `vrfs.<vrfname>.protocols` object will be a json_object_string: 48 bytes + length of string
If there are 900k of those, with each having a 10 bytes key and value, you should expect roughly 92MB for in-memory object storage, and definitely no more than that to serialize the object to a string.
If you find that your use of the json-c library uses much more than that, open a new issue over in the json-c project (i.e. please don't just piggy-back on json-c/json-c#552)
Hi Eric, thanks for the quick update.
As example, please find an extract of what the output looks like. The below route entry represents one of the 993276 entries present.
{
"vrfId": 0,
"vrfName": "default",
"tableVersion": 993276,
"routerId": "165.16.221.64",
"defaultLocPrf": 100,
"localAS": 65500,
"routes": {
"0.0.0.0/0":{
"prefix": "0.0.0.0/0",
"version": "1",
"advertisedTo":{
"165.16.221.65":{
"hostname":"dut2-sureau-nianticvf"
}
},
"paths":[{
"aspath":{
"string":"37721 3257",
"segments":[{
"type":"as-sequence",
"list":[37721,3257]
}],
"length":2
},
"origin":"IGP","valid":true,"version":1,
"bestpath":{
"overall":true,
"selectionReason":"First path received"
},
"community":{
"string":"37721:4000 37721:4006 37721:4200 37721:4230",
"list":[
"37721:4000","37721:4006","37721:4200","37721:4230"
]},
"lastUpdate":{
"epoch":1724653537,"string":"Mon Aug 26 08:25:37 2024\n"
},
"nexthops":[{
"ip":"165.16.221.66","hostname":"dut2-sureau-nianticvf","afi":"ipv4","metric":0,
"accessible":true,"used":true
}],
"peer":{
"peerId":"165.16.221.65",
"routerId":"165.16.221.65","hostname":"dut2-sureau-nianticvf","type":"external"
}
}]
},
The whole file is https://drive.google.com/file/d/1NnXSUX_wuKN2Zcu8r1b8jkjbg63kG8Jx/view?usp=sharing Basically, it is a list of paths with many different options for each time.
Thanks also for the numbers provided. The json functionality itself works very well. I have been a bit clumsy by addressing a comment directly on the json repository, and I apologise for that. My guess is that the memory management is a problem on Linux, and that limiting the memory usage by all means can help reduce the memory footprint.
I do some experiments on memory management:
I try to separate memory used for the show, from the remaining (https://github.com/FRRouting/frr/pull/16654)
I tried to avoid using json_object_get() call (json_object_lock() in frrouting)) on aspath, large communities, and communities, and replaced the current json object with a json object with a simple string: before:
"community":{
"string":"37721:4000 37721:4006 37721:4200 37721:4230",
"list":[
"37721:4000","37721:4006","37721:4200","37721:4230"
]},
...
"aspath":{
"string":"37721 3257",
"segments":[{
"type":"as-sequence",
"list":[37721,3257]
}],
"length":2
},
after:
"community":{
"string":"37721:4000 37721:4006 37721:4200 37721:4230",
}
...
"aspath":{
"string":"37721 3257",
},
I could have a global VM size of 2043920 KB instead of 2137220 KB
I would like to reuse memory blocks used by each prefix. Actually, why would I need to do malloc/free 993276 times, whereas I basically dump each time the same prefix structure.
This is based on the last experiment that I need some help on the json APIs available to build such cases.
As additional test done, without changing the json model, I could see that the vty_json_no_pretty() function takes a lot of memory.
text = json_object_to_json_string_ext()
json_object_free(json);
if the call is not done, the virtual memory size is far better. Virtual Memory size increased from 1663864 KB to 1704664 KB instead of 2043920 KB.
root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root 59828 28.3 14.5 1663864 1485696 ? Ssl 12:02 1:47 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp
root@dut-sureau-nianticvf:~# time vtysh -c "show bgp ipv4 json detail" > /tmp/showbgpipv4detailjson.txt
real 0m24.767s
user 0m1.152s
sys 0m1.152s
root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root 59828 31.6 14.9 1704664 1526640 ? Ssl 12:02 2:10 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp
Finding out how to optimize the display could help resolve this spike in VM size.
Description
Under a linux device that received a 900K prefixes full route, if I dump the detailed json output on a file, I can see a dramatic increase in the virtual memory size used.
Virtual Memory size went from 1224200 to 2638212 KB Resident Memory size went from 1044208 to 2457428 KB
Version
How to reproduce
get a full route setup, wait for stabilisation in the ZEBRA RIB. Then request bgpd with above command.
Expected behavior
I dont expect a memory increase in VM size
Actual behavior
dramatic increase of VM size
Additional context
This is a full route extract with router peering with a single device. However, in a real ISP scenario, multiple peering may happen. Increasing the number of peers increases the memory used.
Checklist