Closed jonas-kaufmann closed 1 year ago
This is the import time profile after all optimizations. As you can see, most of the time on the benchbuild side is spent on importing packages related to the database. I am not too familiar with all the code but to me it seems that this can't really be reduced since the database is a core feature of benchbuild.
Thanks for working on this!
I never thought about it for long. But I think the sqlalchemy support can be removed "fairly" easily. The only real deep connection to the rest of benchbuild should be in benchbuild.utils.run.RunInfo
and benchbuild.utils.actions
.
For a bit of perspective: sqlalchemy.migrate
will go as soon as I finally get to unbreaking the github CI, because it blocks me from upgading to a newer sqlalchemy version.
The rest should work as you implemented it. Earlier versions of benchbuild relied on the standard pickle module. This relied on being able to have all necessary imports present in the unpickling python interpreter.
Oh, I just saw https://github.com/PolyJIT/benchbuild/pull/549. I tested again with sqlalchemy.migrate
removed. This yields another nice speed-up and also means that https://github.com/PolyJIT/benchbuild/commit/1ae6fdbbfd660fc4fd17765987aebf9da90dc834 actually has the intended effect.
I never thought about it for long. But I think the sqlalchemy support can be removed "fairly" easily. The only real deep connection to the rest of benchbuild should be in
benchbuild.utils.run.RunInfo
andbenchbuild.utils.actions
. For a bit of perspective:sqlalchemy.migrate
will go as soon as I finally get to unbreaking the github CI, because it blocks me from upgading to a newer sqlalchemy version.
I had a look again and it's definitely pretty easy to disable sqlalchemy support as well, thanks for your input. My idea is to add a switch for disabling the database in the benchbuild config. I am going to clean up my code so far and then push that into this PR as well.
Force pushing really wasn't the call, anyway, now the damage is done.
Adding the configuration option and the respective if guards for db/transaction code was pretty straightforward. Below is also the import time profile after all these optimizations. For the VaRA-Tool-Suite, we are now at a ~4-5x faster invocation time for the wrapped compiler.
Ignore CI errors with doc steps. This is the ancient package 'pheasant' being out-of-date for python 3.10.
I will replace it with sphinx in a future PR.
Ok. There was one last thing that was looking weird in the profile and that was the amount of time spent in benchbuild.settings to parse the config. The reason for that is that yaml.load() is slow because yaml seems to be a complex language to parse. I was able to reduce the time necessary though by switching to the C-based yaml loader and by removing unecessary loads in init_from_env()
.
This is the current state now. I don't see anything else weirdly taking a lot of time in the profile anymore. If you don't have any more ideas either, we can merge this from my side.
Patch coverage: 20.87
% and project coverage change: -2.70
:warning:
Comparison is base (
480223c
) 52.71% compared to head (5f21147
) 50.01%.:exclamation: Current head 5f21147 differs from pull request most recent head eabd76a. Consider uploading reports for the commit eabd76a to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I have been investigating https://github.com/se-sic/VaRA/issues/1003. Concretely, I profiled the import time of the script
benchbuild/res/wrapping/run_compiler.py.inc
. I found that, at least for the VaRA-Tool-Suite, a substantial amount of time is being spent on importing plugins in the form of other projects and experiments which are not actually being used. So the main idea is to prevent that by overriding the benchbuild configuration through environment variables. Together with some other changes on VaRA's side, I managed to achieve about a 2x performance improvement in my tests.The other thing that might give a speed improvement in the future is to use
importlib.metadata
to query the installed benchbuild version instead ofpkg_resources
.pkg_resources
takes almost 100 ms on my machine to import, whileimportlib.metadata
is negligible. https://github.com/PolyJIT/benchbuild/commit/1ae6fdbbfd660fc4fd17765987aebf9da90dc834 won't do anything at the moment though because there are at least two third-party packages that also usepkg_resources
to set__version__
.In my opinion, the change to
benchbuild/res/wrapping/run_compiler.py.inc
should be completely transparent to the user and the necessary imports are still made when unpickling the project/compiler. @vulder @boehmseb @simbuerg What do you think about this? Am I missing some detail where this might not work in certain cases?