brettviren / worch

Let the orchestration waf through the suite.
5 stars 3 forks source link

Bad scaling performance in configuration data. #46

Closed brettviren closed 9 years ago

brettviren commented 10 years ago

There is a scaling performance problem. As more packages are added to the configuration it takes much longer to do a waf configure. This is likely the N2 effect of every package knowing about every other package's configuration items. It's probably exacerbated by the multiple passes to the config data done during various resolutions. In general, the interpreting done by deconf.py needs some cleaning up.

Most of the cross-package variables are left unused so the plan is to write "smart" dictionary-like classes instead of using plain dict to store the parsed config files so that variable resolution can be done on an as-needed basis.

brettviren commented 10 years ago

Commit 3c7fd0d8c0a76a7b05c41f79f891537fa726e6b6 starts to fix this.

As expected the scaling problem is in the code which expands each dictionary of package configuration items to hold a copy of every other package's configuration items. Using the larsoft configuration this gets timed as:

Created in 0.027
Fold_in in 35.375
Iterated in 0.000 (#g=146, #p=3023)

The numbers in the last like are how many group variables and how many package variables.

By dynamically resolving the inheritance of the parent node's variables and accessing other "sister" node's variables one gets:

Created in 0.039
Iterated in 4.335 (#g=10058, #p=554386)
Iterated again in 0.323 (#g=10058, #p=554386)

There is no "fold in" time but the dynamic lookups cause the iteration time to suffer and caching helps that on subsequent look ups. The total number of variables is vastly higher as each level of the hierarchy may have their own variables referenced as "sister" values. So, if one has group "all" and package "pkg" then one can have separately all_install_dir, pkg_install_dir. So the new language is more expressive and is still about 10x faster in this very worse case of iterating through all these new variables (100x more). Actual use is guided referencing select variables so this worse case will not typically be reached.

At least I think. Next step is to make the changes required by this new way of populating the configuration data into a real job and see.

Edit: update second timings after bug fix (4b0a17). They are a little bit slower.

brettviren commented 10 years ago

This work is currently in branch sdeconf.

brettviren commented 9 years ago

Fixed a while ago.