[question/bug] conan install takes more than 90 seconds

Ariox41 commented 9 months ago

What is your question?

We've been using conan 2 for a while and it worked quite well. Our modules depend on boost and qt, conan install was executed in about 20 seconds. However, today I added a module that depends on libsndfile and openal-soft, and after that the conan install for the final application increased to 90 seconds. This clearly exceeds the time that developers are willing to put up with.

This does not apply to most of our modules, so we are still ready to put up with it. But I did a little research anyway:

In total, the application has about 220 dependencies, 130 of them from conancenter.
Profiling shows that the DepsGraph.add_edge() method takes a significant amount of time.
After deleting the assert src in self.nodes and dst in self.nodes line, the conan install time is reduced to 60 seconds, which is significant.

When I try to run the python3 -O ~/.local/bin/conan install . command instead, I get an error (ubuntu 22.04, python3.10, conan 2.0.17):

...
File "~/.local/lib/python3.10/site-packages/conan/internal/cache/db/recipes_table.py", line 91, in all_references
  with self.db_connection() as conn:
File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
  next(self.gen)
File "~/.local/lib/python3.10/site-packages/conan/internal/cache/db/table.py", line 32, in db_connection
  self._lock.release()
RuntimeError: release unlocked lock

It looks like a serious bug in conan.

When trying to use pypy3, I get a login request, and 'canon remote login` does not solve the problem. Conan doesn't seem to work with pypy. This may also be a consequence of errors in conan.
I also noticed that conan recipe methods are called hundreds, if not thousands of times. There doesn't seem to be any caching. Could this be a problem in the recipes themselves? Can self.requires("...", force=True) in some recipes lead to a significant slowdown?

Thus, at least part of the problem is on the conan side.

Has anyone faced the same problem? It seems like 200 dependencies shouldn't be uncommon, considering that most of them are external dependencies.

Have you read the CONTRIBUTING guide?

[X] I've read the CONTRIBUTING guide

memsharded commented 9 months ago

Hi @Ariox41

Thanks for your report.

It seems there are some different things here in this issue, but overall it would be related to performance. Installation time of dependencies is mostly driven by download + unzip time. If dependencies are already cached in the local cache, Conan commands should be very fast. If downloading hundreds of dependencies from servers to a blank cache, that means downloading tons of binaries, sometimes even big ones, so a couple of minutes are definitely expected to download and unzip everything.

This clearly exceeds the time that developers are willing to put up with.

I have used maven and npm in the past, installs of projects taking a few minutes were very common. When installing dependencies from the internet, things take some time. Building many C++ projects take orders of magnitude more time than that, so being able to download, unzip and install +200 packages in 1.5 minutes doesn't seem bad at all, but rather excellent compared with the alternatives?

So it would be great to clarify the timings, if we are talking about a conan install with the dependencies already in the local cache, please let us know, because that should be very fast, and there could be something indeed failing.

Regarding pypy, Conan is not tested or validated to run with it. Only cpython is supported at this moment. In general, the Conan execution speed is not an issue, because downloading, unzipping and building from source take way more time than the execution time of Conan itself.

Ariox41 commented 9 months ago

I forgot to clarify, all dependencies are in the conan cache. Otherwise, it really wouldn't be a problem.

Ariox41 commented 9 months ago

Removing libsndfile and openal-soft from the dependency tree did not reduce the installation time. It seems that something changed in the package registry when adding libsndfile and openal-soft (I used conan download openal/1.22.2 -r conancenter and the subsequent `conan upload openal/1.22.2 -r my_repo'.

I will try to recreate the repository with external dependencies, filling it with up-to-date and compatible recipes. However, it is still not entirely clear why updating dependencies led to an increase in installation time, given that all packages are eventually loaded into the local cache. Apparently, the problem is still related to self.requires("...", force=True)

memsharded commented 9 months ago

I forgot to clarify, all dependencies are in the conan cache. Otherwise, it really wouldn't be a problem.

Ok, then there is definitely something to investigate there. Even with 200 deps, the installation time should be very fast, if not using --update, it should be in a matter of seconds. Can you please double check that you are using the latest 2.0.17? There was some issue with previous versions and the compatibility.py checks, but that was solved in recent releases.

For a reference, installing deps for the tensorflow-lite demo in the examples2 repo, that depends on around 65 packages, is resolved locally in around 5 seconds (around 2-3 seconds with -nr/--no-remote. Time should be linear, so 200 packages should take probably up to 15-20 seconds at most, probably <10 seconds with -nr.

Then, there would be some things to double check:

Test in different machines, with different OSs if possible, is the same everywhere?
Double check that the servers are not being hit, passing the --no-remotes argument to conan install or conan remote disable * can force it
There could be some issue with the sqlite installation that would be slowing things down. Testing in different OSs should help to identify if this is the issue.

Ariox41 commented 9 months ago

I have tried various solutions. Among other things, I recreated the repository of external dependencies by cloning conan-center-index, after which my environment began to fully comply with the recommendations of the conan documentation. After that, the installation step began to take 2 minutes, as there were more dependencies after updating the packages. On a more powerful computer, the installation takes 80 seconds.

I've tried using conan 2.0.15 and 2.0.17, it doesn't matter. The -nr argument does not affect the result. I can't test on windows, but the problem manifests itself on various debian-like systems.

The problem does not manifest itself in a simple artificial test, the installation of more than 200 dependencies takes less than 10 seconds, regardless of the complexity of the graph, and the time is clearly linear depending on the number of dependencies. In the real graph, the complexity is nonlinear: if a module depends on two libraries, the installation of which takes 20 seconds, the installation of this module will take 40 seconds, despite the presence of common dependencies. The problem probably manifests itself when using more specific conan functions. For example, it may be related to test_requires or python_requires. Perhaps semantic versioning also matters, but when replacing all [^x.y.z] with [^x.y.0], nothing has changed.

memsharded commented 9 months ago

Thanks very much for your feedback, this is interesting, an interesting problem indeed. So if an artificial test with the same number of dependencies is relatively fast, I think we need to investigate where this time is going to.

We have this with conanfile_exception_formatter(conanfile, "build"): context for most of the recipe methods. Maybe it would be good to do some tracing, and try to understand if some specific methods are responsible for the time spent.

Using this patch on the Conan code:

diff --git a/conans/errors.py b/conans/errors.py
index a1609f386..352c5f1e2 100644
--- a/conans/errors.py
+++ b/conans/errors.py
@@ -8,6 +8,7 @@
     see return_plugin.py

 """
+import time
 from contextlib import contextmanager

@@ -48,6 +49,7 @@ def conanfile_exception_formatter(conanfile_name, func_name):
                              scope="conanfile.py")
         raise ConanException(m)

+    t = time.time()
     try:
         yield
     # TODO: Move ConanInvalidConfiguration from here?
@@ -65,6 +67,8 @@ def conanfile_exception_formatter(conanfile_name, func_name):
             _raise_conanfile_exc(exc)
     except Exception as exc:
         _raise_conanfile_exc(exc)
+    from conan.api.output import ConanOutput
+    ConanOutput().debug(f"------- {conanfile_name}:{func_name}() is {time.time()-t:.3f} s -------")

Will print (with -vvv argument), the time spent in a few recipe methods, for all dependencies. Maybe it is possible to use this, run your code and extract the logs, to see if there is something that is noticeable there?

If this doesn't give any hint, we might need to instrument the Conan codebase more thorougly for profiling/timing, to be able to detect possible bottlenecks that could be causing those slow times.

Ariox41 commented 9 months ago

I applied the patch and generated a log. A simple search for the regular expression \)\sis\s(?!0.00) revealed no problems, except for a single one ------- xorg/system:package_info() is 2.627 s -------

I am attaching the log of one of the submodules. It takes 17 seconds to install, and it is already noticeable that the same functions for the same packages are called many times. install.log

memsharded commented 9 months ago

That is good feedback, thanks for sharing the log.

xorg/system:package_info() is 2.627 s

This would be expected, this recipe is checking the system installed xorg, and it takes a bit to call the system package manager to check things.

Having a look to the logs, I think the topology of the graph could be related. Capturing the graph with conan graph info --format=json would probably show this, if you can also share this log, that would be useful.

It seems that while there are around 200 dependencies, the graph could have orders of magnitude more nodes, depending on how test_requires and tool_requires are defined. Every test and tool requires is "private" to the node that declares them. I am surprised to see things like libjpeg as both a test_requires and tool_requires, this is a bit unexpected, because test_requires should be mostly things like gtest or catch and tool_requires should be executable tools like cmake and ninja.

This would be the reason why we are seeing repeated calls in the log, because every independent "private" subgraph of every normal library needs to be expanded and evaluated, as package recipes can inject custom options for example into their own test_requires and tool_requires.

For example I am seeing m4/1.4.19:package_id() executed +1000 times! this means like there are 1000 nodes in the dependency graph for this m4 package.

Ariox41 commented 9 months ago

You're right, here's our dependency graph: graph.json

We have many reasons to use different dependencies in test_requires. Our architecture is based on plugins, plugin testing requires plug-in launchers and other plugins. Even if there are no plug-ins, graphics modules still require testing, even if the tests have to be run manually. For this purpose, we use special debugging modules, which should not be in the production environment.

In general, I don't see any reason why this could be a problem, and I definitely wouldn't want to use self.requires() instead of self.test_requires(), as this would inflate RPATH or LD_LIBRARY_PATH and complicate the deployment scripts of the end applications.

Ariox41 commented 9 months ago

Although I also don't see any reason why libjpeg is both test_requires and tool_requires. Our recipes do not contain any weird tool_requires.

memsharded commented 9 months ago

Ok, I count 5600 nodes in the dependency graph, which means 90 seconds are 0.015 seconds per node, 15 milliseconds for processing each node in the graph, which is not that bad. So while it makes sense to try to optimize things as much as possible, there is not a very evident bottleneck, but mostly a scale issue.

In order to try to optimize things, it would be necessary some detailed profiling, this might be a bit more challenging, but I'll think a bit about this.

Ariox41 commented 9 months ago

The above dependency graph is built in 20 seconds, since it is not a finite module. The full dependency graph does not fit within the GitHub file size limit.

But I'm not sure why a subtree is being built for test_requires at all? It seems the point of test_requires and tool_requires is that these dependencies are not propagated. Wouldn't it be better to just filter them for indirect dependencies? At least, I see no reason why test_requires of indirect dependencies are needed when executing "conan install", although they may be useful for "conan graph info".

memsharded commented 9 months ago

But I'm not sure why a subtree is being built for test_requires at all? It seems the point of test_requires and tool_requires is that these dependencies are not propagated. Wouldn't it be better to just filter them for indirect dependencies? At least, I see no reason why test_requires of indirect dependencies are needed when executing "conan install", although they may be useful for "conan graph info".

good question.

This was discussed time ago: https://github.com/conan-io/tribe/pull/20 in preparation for 2.0. Not having the dependency graph fully expanded was a source of multitude of problems, incomplete lockfiles, later failures in CI due to undetected conflicts, etc.

memsharded commented 9 months ago

I have started to do some profiling to see if there is some low hanging fruit that could be optimized.

memsharded commented 9 months ago

Did a PR, achieving some speed ups around 15% for the tested graphs (1600 nodes). In case you want to try running from source, this is my branch https://github.com/conan-io/conan/pull/15623

Still it seems that trying to reduce the size of the graph reducing test_requires would be something to try if possible.

Ariox41 commented 9 months ago

It seems that replacing test_requires with requires(visible=False) does not reduce the size of the graph, although at the moment I do not have time to fully test this. While requires(visible=True) has obvious negative side effects, including version conflicts.

We will also probably run into a problem when we migrate our own code generation tool (tool_requires) to conan, since the code generation tool has a dependency on boost, eigen, rapidjson, tinixml and possibly something else.

I can probably work around these issues somehow, but I still believe that my use case is not a rare extreme case, and the problem needs to be solved at the conan level. This issue probably requires a broader discussion.

memsharded commented 9 months ago

It seems that replacing test_requires with requires(visible=False) does not reduce the size of the graph, although at the moment I do not have time to fully test this. While requires(visible=True) has obvious negative side effects, including version conflicts.

test_requires are already visible=False internally

I can probably work around these issues somehow, but I still believe that my use case is not a rare extreme case, and the problem needs to be solved at the conan level. This issue probably requires a broader discussion.

We are already trying, but optimizing is challenging, as commented above, every node expansion is taking just a very few milliseconds, it is challenging to reduce that. Have you seen https://github.com/conan-io/conan/pull/15623? Do you think you can run on your test and see if it reduces something?

Ariox41 commented 9 months ago

I'll try to check it out this weekend. But I think that as long as the complexity of the algorithm is exponential, such optimizations will not solve the problem. Therefore, it seems reasonable to me to be able not to build nodes that are irrelevant in specific use cases.

This is not an urgent issue for me, and mostly I would be more interested in learning the prospects. It seems that the initial theoretical assumptions in conan-io/tribe/pull/20 do not correspond to practice, but perhaps the problem still has a relatively simple solution.

memsharded commented 9 months ago

I'll try to check it out this weekend. But I think that as long as the complexity of the algorithm is exponential,

Fortunately no, the algorithm is linear (or nlogn), and what is polinomial (n x m) is the size of the graph for n=pkgs and m=number of test_requires applied to all packages

I would be more interested in learning the prospects. It seems that the initial theoretical assumptions in https://github.com/conan-io/tribe/pull/20 do not correspond to practice, but perhaps the problem still has a relatively simple solution.

On the contrary, those are not theoretical assumptions. That was a decision based on years of practice, and tons of user feedback, and the decision was confirmed by most of the tribe, which are long time Conan users, many at large scale. The case of having so many test_requires is the exception here, it is the first time we have seen this issue.

Therefore, it seems reasonable to me to be able not to build nodes that are irrelevant in specific use cases.

While it seems reasonable on the surface, as I commented above, this is one of the major issues encountered by users, failures due to not having the full dependency graph expanded.

There are some possible future improvements, like the "re-package" concept that would allow to create bundles of all test-requires in one single package for example.

And of course, we will keep working on improving the performance and optimizing as much as possible.

Ariox41 commented 9 months ago

I think I have found a solution to reduce the size of the graph in a controlled way. I couldn't immediately figure out if this idea could be implemented, and therefore assumed that a solution on the conan side was needed. So, I added the following to the general python_requires_extend recipe:

class CommonRecipe(object):
    def config_options(self):
        self.options.update({"enable_tests": [True, False]}, {"enable_tests": True, "*/*@*:enable_tests": False})

    def package_id(self):
        self.info.options.rm_safe('enable_tests')

    def generate(self):
        tc = CMakeToolchain(self)
        tc.variables['BUILD_TESTING'] = (
            'ON' if self.options.enable_tests else 'OFF')
        tc.generate()

    def nm_test_requires(self, *args, **kwargs):
        if self.options.enable_tests:
            self.test_requires(*args, **kwargs)

And then our recipes use self.nm_test_requires instead of self.test_requires. In our CMakeLists.txt the logic of correctly disabling tests when the BUILD_TESTING option is not set has already been implemented for some other reasons.

It seems to completely solve my problem, and I don't see any negative side effects yet.

memsharded commented 9 months ago

Oh! That is a very good hint, I didn't think of that as I thought that that wasn't a possibility for you.

If you can actually fully disable the testing part, and you don't care about it, then of course your recipes can conditionally test_requires things, that is great that you came up with this.

Just an implementation detail: as the final package binaries are not affected by tests, you might better use conf than option. By default the options affect the final binary package_id, you would need to recall to remove the option in the package_id() method, declare the values in the recipe, etc.

Using conf doesn't require anything else in the recipes, just reading the value, like self.conf.get(), and the values can be defined in profiles and in command line easily.

For the conf approach you could have 2 alternatives:

User your own user.myteam:tests conf. Then condition in your recipe based on it
We already have a builtin tools.build:skip_test that disables the build and execution of the tests by default if set to True. If you want to reuse it, then adding a condition on it for the test_requires() would be enough, and you might achieve what you want with just 1 line modification to your recipes.

Just you need to be careful, and understand what are the possible downsides, specially if using lockfiles, or if you get versions conflicts in your test_requires subgraphs later in the process. But if the approach of skipping the test_requires controlled by a conf is good for you, then that would probably be a great solution.

Ariox41 commented 9 months ago

The problem is that tests are still needed in the root module. The options based solution allows us to enable tests in the current recipe, but disable them in dependencies. Now, when performing build tasks in CI/CD pipelines, tests for the current module will be performed, but not for dependencies. If developers are prohibited from performing conan upload manually, we can be sure that any recipe uploaded to the repository has already passed tests at some point, and we can safely perform --build missing without running tests at least until we expand the list of supported systems.

So I close the task. However, I want to note that the conan documentation currently does not contain several important things:

The description of build_requirements does not contain anything about the impact on the complexity of the dependency graph. In this section, it is definitely necessary to add a warning about the problem when using build_requires, which themselves have many dependencies.
The description of the options attribute does not contain anything about the update method. It is not obvious whether it is possible to add new options in the config_options method.
The possibility of calling self.info.options.rm_safe in package_id is also not obvious. Although removing options from the pakcage_id is not a typical case, in an enterprise environment there may be many reasons to use options without affecting the package_id, and the documentation should be more explicit on this issue. I wouldn't want to run into a compatibility issue after a while.

memsharded commented 9 months ago

The problem is that tests are still needed in the root module.

I think there is a syntax that can help with that (which by the way, is the same for options and settings). You can use the & (=consumer) placeholder. Putting something like this in your profile:

[conf]
tools.build:skip_test=True
&tools.build:skip_test=False

Will skip the test for depenencies, but will not skip the tests for the current "consumer" project only.

So I close the task. However, I want to note that the conan documentation currently does not contain several important things:

Thanks for pointing these out. I think I will extract an issue from your comment to the docs repo, at least to track it and try to improve the docs.

Also, keep an eye on the future releases of Conan, we will keep trying to further optimize things, feedback always welcome.

Thanks for all the feedback!

conan-io / conan

[question/bug] conan install takes more than 90 seconds #15579

What is your question?

Have you read the CONTRIBUTING guide?