conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.17k stars 974 forks source link

[feature] System cache #16600

Open andrey-zherikov opened 3 months ago

andrey-zherikov commented 3 months ago

What is your suggestion?

I believe the following feature will be very useful for conan. The idea is to have a system-wide cache of binary packages that can be used to minimize downloading from remotes. This cache is something between remote and conan local cache (in $CONAN_HOME): it can be used to retrieve packages but not to accept new exported or created packages, on the other side it can store packages uncompressed so conan can use them the same way as from local cache.

This will be useful in CI workflow where every CI job sets CONAN_HOME to job workspace (for isolation purposes). In this case conan downloads all dependencies from remotes every time even those packages that are not changed frequently and can be easily cached on CI agent (for example: build tools and compilers that are packaged as conan packages). In the next CI job, conan can see that a binary package is already cached and can be taken from the cache, not from remote.

This will be even useful for developers that have to download large packages from the other side of the world (e.g. developer is in APAC while artifactory hosting is in AMER).

I've checked some existing features but they are not exactly what I'm looking for:

Have you read the CONTRIBUTING guide?

memsharded commented 3 months ago

Hi @andrey-zherikov

Thanks for the suggestion.

This feature has been considered before, but it happens that it would be quite a complex feature, both from a functional and a UX perspective. So while this is something that it could make sense, unfortunately we cannot prioritize it enough to make it into our roadmap, as there are many other higher priorities, for example the CI workflows, the workspaces or the cache concurrency.

At the moment the recommended approach is what you already outlined, using core.download:download_cache will have the intended benefits, except the extra unzipping time necessary for new caches. For developers it is not a big issue, as they keep their cache populated. For CI using a blank cache each time it will have more impact, but still we are talking about mostly performance, while all the other features above are very requested core functionality.

For some very large tools, it might be possible to circumvent the cost, by using a separate Conan cache, installing the packages in that cache, then activating the environment defined from those tools and switching to the new cache, making them similar to already installed in the system for the following Conan commands.

I can label this as a 2.X future roadmap feature, but I don't think this will be possible any time soon, sorry.

andrey-zherikov commented 3 months ago

I can label this as a 2.X future roadmap feature, but I don't think this will be possible any time soon, sorry.

Thank you! I know this is not a simple feature.

What do you think about supporting different cache locations based on package name? For example, all buildtool* packages are located in one cache (${SYSTEM_CACHE}) whereas all others are located in default one (${CONAN_HOME}). I guess this will requires less changes and can be supported through dictionary in core.cache:storage_path and conan already has similar feature - "allowed packages" in remotes.

memsharded commented 3 months ago

Yes, that is definitely doable, I have been able to draft something in https://github.com/memsharded/conan/pull/new/feature/system_cache

The problem is not the location of the packages based on the package name, but the expectations and different behavior that packages would have in those caches.

For example, with the above implementation, a conan remove "*" -c will simply remove all packages from both storage locations. Is this expected? If not, what would be the UI to specify the different flows, removing from one storage or the other? There will be many other similar aspects, for example the "build" of such packages is expected to be done in the storage defined by those special patterns? Or that only applies for packages that are downloaded from servers?

andrey-zherikov commented 3 months ago

Right, this makes other things unclear and it might be confusing even if we say that writing to the cache will follow the same rule - according to package name.

Maybe having explicit "extra"/"read-only" storage paths will help? This will solve writing problem - all writes (build, create, export etc) will go to regular cache. Package lookup can be done in this order: regular cache, extra/read-only caches (in order), remotes. So if a package is found in read-only cache but has no compatible binary package, it will be built and written to regular cache. Next time when it's needed, it will be used from regular cache, not read-only due to lookup order.

The way how I think this can be used in our use case is, for example, we can put compiler/buildtools packages into profiles and have a CI that ensures that all those packages are installed into system-wide caches on all CI nodes. Then all other CI workflows can reuse this cache and avoid downloading rarely changed packages.

memsharded commented 2 months ago

Maybe having explicit "extra"/"read-only" storage paths will help? This will solve writing problem - all writes (build, create, export etc) will go to regular cache. Package lookup can be done in this order: regular cache, extra/read-only caches (in order), remotes. So if a package is found in read-only cache but has no compatible binary package, it will be built and written to regular cache. Next time when it's needed, it will be used from regular cache, not read-only due to lookup order.

This multi-level cache would make things way more complicated than the name-filter approach, and some of the previous questions still remain. What would be the command to remove things from the secondary cache? And if I want to create a package and put it in the secondary cache, do I need to upload it and download it so it goes to the secondary cache? What is the rule then for selecting the cache, just read/write?

The read-write classification also conflicts with what I understand is the most demanded use case for the secondary cache: to store the long term, stable, very heavy packages, typically build tools. But users do want to be able to store in the main cache the regular library packages, because those are most frequently updated, they might want to have other kind of stability, etc. I still think that the selection to have packages on one cache or the other is mostly a user decision per-package than the read-write criteria. Based on your use case description I still think this is the case.

Not to say the possible confusions that could arise from things not being updated. If the conan remove * -c is not expected to remove the secondary cache, then, a further conan install ... that some users will expect to install fresh, updated packages, but that will not be the case... I can see this increasing the support load on the maintainers team.

For this feature to be considered we would need a very clear UX/UI for the feature, but so far this looks pretty complicated. I still think the filter based on names is more viable, but we would need to think better about the lifetime of the packages in the secondary cache.

In the meantime, I'd say that some of the recent features can help to slightly reduce the need for this feature: