very slow execution of load_set_file

pheller commented 2 years ago

I have clixon running with a number of openconfig models, namely openconfig-network-instances.

We have implemented a cli plugin method load_set_file that iterates a given file, invoking cliread_parse for each set or delete statement.

I believe each cliread_parse call results in an internal edit-config RPC and a backend semantic validation, which can be time consuming depending on the existing configuration and yang schema.

Looking for input on an optimization here; maybe a mechanism to pass multiple configuration statements to cligen and on to clixon in a single RPC call with a single semantic validation pass.

As an example:

% for i in {1..1000}; do echo "set interfaces interface eth0 config name eth0" >> test104.txt; done
% cat << EOF > test105.txt
configure
load set /tmp/test104.txt
rollback
exit
quit
EOF
% time clixon_cli -F test105.txt
vagrant@ubuntu1804.localdomain> configure
vagrant@ubuntu1804.localdomain / # load set /tmp/test104.txt
load complete
vagrant@ubuntu1804.localdomain / # rollback
vagrant@ubuntu1804.localdomain / # exit
vagrant@ubuntu1804.localdomain> quit

real    3m44.191s
user    2m24.136s
sys 0m0.443s

pheller commented 2 years ago

Seems the constraint may actually be in the frontend CLI:

29118 vagrant   20   0  507156 422360   5176 R  97.7 20.7   0:07.57 clixon_cli
26391 root      20   0  136224  47276   5108 S   1.7  2.3   0:03.43 clixon_backend

olofhagsand commented 2 years ago

Some comments after analyzing. It is the CLI allocating and freeing large amounts of memory for each CLI command. This is due to the large number of YANG files or more precisely, the size of the generated auto-cli tree from YANG syntax. In one openconfig example, there are ca 100 YANG files each generating auto-cli:s. The cli does dynamical expansion of trees using the "@tree" syntax, and in this case, this means making a copy of the complete model (the auto-cli tree generated from YANG). This is a problem that needs to be addressed. Workarounds include loading syntax as XML or JSON, not individual CLI commands.

olofhagsand commented 2 years ago

Made a rather large set of changes to address performance problems with the auto-cli for large yang configs, see commit-messages above for a detailed list. In a test-case using openconfig-network-instances.yang a 10x performance increase was measured (in time). This was partly done by reducing memory (ca 50%) but mainly by different algorithmic optimizations. The positive side was that this was a new area for optimization with several "low hanging fruit" The negative side was that several of the algorithmic changes were deep in core cligen code. All primary tests have passed, but since a lot of changes have been made, more tests need to be done, and verification input is welcome.

pheller commented 2 years ago

Ok, with the 5.4.0 changes, I've done some benchmarking.

This load merge function is essentially turning /tmp/openconfig.conf from a junos-object style notation into a bunch of set statements; the same set each time. The first execution of this prior to 5.4.0 took over 12 minutes of this, so this is a vast improvement.

However, repeating the same configuration a few times reveals a linear growth in time.

During these loads, the clixon_cli utilization is ~ 0%, while the backend holds steady near 100%.

% time clixon_cli -F load-object-script.txt
clixon> configure
clixon / # load merge /tmp/openconfig.conf
load complete
clixon / # commit
clixon / # exit
clixon> quit

real    1m30.103s
user    0m4.771s
sys 0m0.633s
% time clixon_cli -F load-object-script.txt
clixon> configure
clixon / # load merge /tmp/openconfig.conf
load complete
clixon / # commit
clixon / # exit
clixon> quit

real    5m59.499s
user    0m6.623s
sys 0m0.496s
% time clixon_cli -F load-object-script.txt
clixon> configure
clixon / # load merge /tmp/openconfig.conf
load complete
clixon / # commit
clixon / # exit
clixon> quit

real    7m50.876s
user    0m7.162s
sys 0m0.597s
% time clixon_cli -F load-object-script.txt
clixon> configure
clixon / # load merge /tmp/openconfig.conf
load complete
clixon / # commit
clixon / # exit
clixon> quit

real    8m24.685s
user    0m7.264s
sys 0m0.527s
%

olofhagsand commented 2 years ago

OK, thanks. Strange, there could be a case that it would step up once to a higher level if you load an identical file, going from an empty db to a populated db. But then, it should not continue to increase.

pheller commented 2 years ago

Ok, the frontend performance processing is improved as described with the related commits.

clicon / clixon

very slow execution of load_set_file #288