conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.14k stars 970 forks source link

[feature request] add support for simultaneous multiple conan caches #5513

Open DoDoENT opened 5 years ago

DoDoENT commented 5 years ago

This idea was first discussed in issue #5498 in this comment.

The idea is to enable conan to support multiple conan data folders simultaneously. The packages that are dependencies of the package being installed would be installed into the default cache folder and the current package could be exported into the ephemeral local cache required just for testing.

The motivation for this comes from the following example: let's say that my repository contains two packages A and B. B depends on A and they always need to be developed together (think of an example with OcrEngine and OcrEngineModelBuilder from the description of issue #4753). So, in order to ensure that package B can be tested with conan test, it also needs the same version of package A, which is still not exported. If multiple conan cache folders were supported, the Jenkins executor could export both A and B into its temporary folder to perform the conan test, while still using the default conan cache for other dependencies of A and B.

If the same package is found in multiple cache folders, an error could be raised which could be corrected by specifying from which cache the package should be consumed.

jgsogo commented 5 years ago

This feature request is very interesting and it is the path we are following when we refactor the codebase trying to make more use of the PackageLayouts https://github.com/conan-io/conan/pull/5105 (currently we have the PackageCacheLayout and PackageEditableLayout). Once we manage to detach the packages from the cache it would be easy to have packages stored or coming from several caches 😉

This will require a huge effort, at some Conan stages it is not easy to decouple a package from the cache, but we are walking in that direction step by step.

KerstinKeller commented 5 years ago

While setting up our CI Workflow we stumbled upon the same issue. It would be very useful to have one "global" cache where a package build could consume all its dependencies from, and a local cache where all the building takes place.

A huge benefit would be that each build job can clean up after itself, if the build cache is located in the current CI build folder, while the "official" cache for consuming packages is shared between all build jobs.

It might not sound like a big deal, but if you use CI and Conan to create packages as part of quality checks on every branch / commit for large projects, you do produce a lot of artifacts, which are not even supposed to be released / promoted.

jgsogo commented 5 years ago

Hi, @KerstinKeller. This is something we want to take into account but there are other core-features of Conan that require our attention too. Meanwhile I've opened this issue (https://github.com/conan-io/conan/issues/5553) to gather use-cases related to CI and Conan. If you want to share what you are trying to implement maybe we can find a way to do it without the multi-cache (maybe it is easier to develop something else for the Artifactory-plugin than to get this feature ready).

Thanks!

petermbauer commented 4 years ago

While setting up our CI Workflow we stumbled upon the same issue. It would be very useful to have one "global" cache where a package build could consume all its dependencies from, and a local cache where all the building takes place.

Same here. For Linux i was thinking about using Docker and preparing images containing a "default" Conan local cache with some third-party dependencies like Qt, compilers, ... to achieve acceptable build times while still having an isolated Conan local cache per build/Jenkins workspace.

Under Windows i want to avoid Docker so i was thinking of creating symlinks for all the folders in the data folder of the "read-only"/"shared"/"global" cache in the data folder of the "per-build"/"writable" cache while ensuring via file/folder permissions that the shared cache can not be modified. Building binary packages from source would of course not be possible with this workaround but this is not an issue for our use-case.

For sure the best solution would be to have built-in support for at least one additional "secondary" read-only Conan local cache.

kenfred commented 3 years ago

I agree that we need the primary cache to be left alone during package development.

The idea of multi-layer cache is definitely a step in the right direction, but I think has some drawbacks. Please see this comment and read the discussion in that thread.

To sum up my opinion, package co-development is closest to the regular conan consumer workflow and it makes more sense for me to run conan install + build + package on a workspace file than introduce new commands around a new concept of a local-cache. A lot of detail in the discussion with that link.

kenfred commented 3 years ago

In order to steer this discussion toward common ground, I should clarify that my main goal is to fix workspaces and keep them alive because our team and our clients rely on them. Although I don't prefer the semantics of "local-cache", conceptually it's exactly what we need: a local folder in which to build and package co-developed packages that is separate from the main cache.

My only requirement is that I be able to create this folder/local-cache directly from a workspace file. This is essential to the function of workspaces and if we move away from it, workspaces will die. This is equivalent to me saying please don't design out workspaces! If we keep that in mind, I think it will lead some of our choices in the command interface to accomplish this local-cache.

jgsogo commented 3 years ago

The new cache we are designing with support for multiple revisions (https://github.com/conan-io/conan/pull/8510) is almost a read-only cache: any change in recipe files will be a different recipe-revision and any new build will be different package-revision. It is read-only in the sense that you won't override anything with different content, although you can remove, of course.

If usage of lockfiles was transparent, it won't be a problem to keep adding revisions to the cache, because existing projects will keep using what they were using before, they won't use newer revisions. This is the key question, how to isolate draft-revisions from production-ready ones.

It looks like these draft-revisions, which are generated while developing a package (sometimes together with draft-revisions of other references) should be closer to the developer. A workflow like conan install + build + export-package would do the job for a single package, but the problem persists when we are co-developing another reference and want to use it here without polluting the production-ready cache.

Workspaces are a different answer, how are they different? A workspace will let you open all the libraries in the same IDE (it is a single CMake project), while here we are opening each library in a different IDE instance. This is my POV, and this is why I think workspaces need to use the same build-system. And why I'm looking for a different approach.

Let's use a different name, co-development (instead of workspaces).

I really like your approach in https://github.com/conan-io/tribe/pull/19#issuecomment-778382753 where you run a local conan create to a local folder (it was my first approach to this problem), but then you need to link dependents projects to this local folder... and I try to find a solution without using an extra file as we will do for workspaces. And then appears the concept of this intermediate cache...

I imagine something like this:


@kenfred IMHO, workspace (as I think about them) are a bit different, and they have their use-case. They need some love, and they will get the attention they deserve in Conan v2.

kenfred commented 3 years ago

@jgsogo Thanks for taking the time to explain your ideas to me.

Yes, creating an alternate cache for development isn't a good solution for the reasons you stated. We still need access to the primary cache for the thousands of dependencies that aren't in the co-development group. Completely agree.

Your proposed solution is quite similar to @DoDoENT. Like his example, I'd prefer creating that cache in a folder near where I'm developing, not in ~/.conan, but the command you suggested would allow that. How do you propose I tell packages to use this secondary cache and not go directly to the global cache? Do all packages use it while it exists? Can I create any number of them in a stack? Can I create any number of them as peers? Do I go to individual packages and tell them to register with that secondary cache?

IMO, we're still thinking of the cache (whether it be the global cache or the local-cache) in an overloaded way. We're trying to make co-development fit this overloaded cache model. This new cache you create has all the same drawbacks of the global cache. We haven't solved the cache pollution, we've just isolated the pollution to a secondary cache.

I know the difference between the local-cache and the out-of-cache folder is mostly semantics and you could say that the out-of-cache folder is just as "polluted' as the local-cache. However, I think making the distinction is important because it affects the conan commands we'll use or invent, which will go to user experience and understanding. Maybe it's just me but, but creating a cache for temporary development doesn't fit my mental model, because:

Maybe we need to come to an agreement on nomenclature. I think of a workspaces are two things:

  1. Quick setup of co-development. Via basic commands on the workspace file, set up co-development.
  2. Generate a master build script to perform build steps on all of the packages in the group.

Where we differ is the expectation that the master build script can be opened in an IDE with all of the folders of all of the co-developed packages neatly present. Your concept of workspaces not only assumes a common build system, but that the build system can generate the IDE project (i.e. cmake). Generally speaking, conan doesn't control the build system and there is no guaranty the build system is cmake, autotools, visual studio or some homegrown thing. I would say your definition of workspaces run contrary to the ethos of conan and perhaps only cmake add_subdirectory can live up to your expectation. The only other option would be build system conversion, which would be a nightmare.

Are you willing to relax your definition of workspaces? If not, what name should we give the thing I'm talking about?

I agree that workspaces (as I seem them) and co-development aren't necessary the same. However, workspaces and co-development need to share the model of how the co-developed packages are associated with one another and developed together. I argue that workspaces is a superset of co-development. Or better put, co-development is a subset of workspaces. Therefore, if you solve workspaces in an elegant way, co-development comes along as a byproduct and you don't have one model and command set for co-development and another model and command set for workspaces.

DoDoENT commented 3 years ago

Therefore, if you solve workspaces in an elegant way, co-development comes along as a byproduct and you don't have one model and command set for co-development and another model and command set for workspaces.

I agree that workspaces are a superset of the co-development, but I would prefer if we first solve the co-development problem and then build workspaces on top of that solution. Otherwise, we risk having co-development solution being designed in the "too-workspace-ish" manner, just like workspaces from Conan 1.12 and before, which were not usable for many workflows (including mine), as they expected the workspace file to always be in the parent folder of all Conan packages that were part of the workspace, as well as supported only a CMake-based build system for all packages participating in the workspace.

So, if we go back to my idea of local project-level caches where it's possible to create different package development groups and where a single package can simultaneously participate in different groups, it's relatively simple to build your workspace idea on top of that.

Simply introduce the workspace manifest file that will only contain paths (either relative or absolute) to conanfiles that participate in the workspace, something like:

/Users/dodo/Desktop/liba/conanfile.py
/Users/dodo/Work/projects/libb/conanfile.py
/Volumes/USBDISK/projects/libc/conanfile.py

Then, when initializing the workspace with manifest as above, the conan (or some 3rd party script) would automatically create a new local cache and export all packages from the manifest into it. It would be only the way of automating what I described earlier as manual work.

However, if you want to create a super-project based on the workspace, I'm afraid that this won't be possible in the true manner of the project, as it would not support incremental builds. Namely, let's say that we create a CMake super-project based on the workspace manifest above. The generated CMake project would essentially have 3 targets:

However, this has one major issue, which I think is very important for co-development - it lacks proper support for an incremental build. Let's say that liba has 100 source files and you change a single CPP file and then build the workspace super-project - the CMake super-project does not have information about how sources influence the final package and doesn't know that it only needs to build that single file - it will instead build the entire liba and package it into the project-level cache. With stateless conan build, as proposed in this PR, that would mean complete rebuilding of the entire liba project (all its 100 source files). This will take a long time and is not very development-friendly.

However, if the conan build would (in that case only, IMHO) only delegate to the native build system without conan install and source copying, then the native build system could infer that only single file has changed and perform the incremental build, but the packaging part would still need to be performed fully. Not ideal, but could work.

Also, I'm not sure if such super-project could track which files belong to which package to know which targets should invoke during the incremental build, however it could just dumbly call build on every target (I believe this is also the default behaviour for custom targets in CMake) and rely on conan build + conan package to be as incremental as possible.

So, in the example above, after changing the single cpp file in liba and then building the CMake super-project, the super-project would build all targets (liba, libb and libc) and for each target the CMake would invoke custom command that will first invoke each package's native build system and then invoke the conan's package function in the conanfile.

Thus, if we assume that liba is scons-based, libb makefile-based and libc cmake-based and super-project cmake-based, the build of the super-project would go as this:

So, there is a lot of "empty-work" here as it's not possible to correctly know at a super-project level that libb and libc don't have to be rebuilt after source file in liba has changed (in the above case, assuming all libs are static libraries).

It's not ideal, but I think it could work for use cases like @kenfred has - in his case he could put all conanfiles into the single repository and also check-in the workspace manifest with relative paths.

@kenfred , @jgsogo, how do you like this idea?

jgsogo commented 3 years ago

I totally agree we need some common ground and nomenclature. I'll try to write down how I see those different terms:

When it comes to development in C++ with Conan I see different scenarios:

And now scenarios where you want, indeed, to modify one recipe/library and take into account those changes in other packages you are developing at the same time. So far Conan offered two alternatives:

I would like to keep referring to these scenarios as:


Based on the above:


DoDoENT commented 3 years ago

...but for sure, any package built because of this development should go into the local-cache instead of the main one.

🤔 , probably yes, although it would look strange if a package from the main cache needs rebuilding because of the override from the local cache - but it's probably the correct behaviour. We need to discuss this further.

No idea if this local-cache should be multi-revisions or it should have the capacity to store only one revision (probably easier to manage, implement and understand).

I'd say that local-cache shouldn't need revisions - it's the temporary local cache after all. You are working on it, editing it, iterating on package creation, etc. Tracking revisions here seems like bloat - you would have a new revision for every development iteration - that doesn't look good.

I see no advantage in using CMake to call other build-systems while we can use Conan itself...

Maybe the only advantage would be to be able to perform the orchestration from the IDE project generated from CMake. Thus, you could edit sources of all packages from the same IDE and then build the entire super-project.

Challenge will be how to optimize that empty-work: the only answer is to trust each project's build-system optimizations and use always the same build folder (without removing any file).

I personally wouldn't bother with that because, as you said, attempting to do something more than trusting the project's build-system would make conan more than a package manager - it would start becoming a new build tool. I'm not sure this is something we want...

I think that it's OK here to trust the project's build system here. Furthermore, if you see the current package co-development flow, it's essentially the same, but not automated. You manually invoke conan create on packages in specific order and then conan install from the consumer project that tests the packages. This is even worse than the empty-work mentioned above - it copies the source and builds it anew every time. Even reducing that to the empty-work of trusting the project's build system as discussed above would be a huge win, IMHO.

jgsogo commented 3 years ago

...but for sure, any package built because of this development should go into the local-cache instead of the main one.

🤔 , probably yes, although it would look strange if a package from the main cache needs rebuilding because of the override from the local cache - but it's probably the correct behaviour. We need to discuss this further.

This is totally needed, imagine you are developing zlib in the OpenCV graph. For sure you need to build again packages in the middle (some inline, some static linked into shared,...). And these packages built with the development version of zlib cannot go into the main cache.

I see no advantage in using CMake to call other build-systems while we can use Conan itself...

Maybe the only advantage would be to be able to perform the orchestration from the IDE project generated from CMake. Thus, you could edit sources of all packages from the same IDE and then build the entire super-project.

That is possible in workspaces (where packages use CMake), but no in the general scenario of co-development (according to my terminology). The only thing you will get in the IDE would be projects that call custom-commands, no source files.


I think we are pretty much on the same line 🎉

kenfred commented 3 years ago

@DoDoENT

Otherwise, we risk having co-development solution being designed in the "too-workspace-ish" manner, just like workspaces from Conan 1.12 and before, which were not usable for many workflows (including mine), as they expected the workspace file to always be in the parent folder of all Conan packages that were part of the workspace, as well as supported only a CMake-based build system for all packages participating in the workspace.

I agree this was is a failing of the chosen implementation of workspaces. In my proposal of the package-level super project, this is no longer an issue. Perhaps workspaces can be rehabilitated for you? Would it work with your use case if it wasn't so cmake-centric?

On incremental builds, you're spot on. With heterogeneous build systems you can not be incremental at the source-level across packages. However, the individual builds are internally incremental, provided their build system is inherently incremental and conan build does not ruin the ability to do the incremental build.

But we are also incremental at the package-level. That is, the super project can track whether there are changes to liba's package folder and only run conan build + package on libc if liba's package changed. Although it is not perfect, source-level incremental builds, it is as good as your can get with heterogeneous builds, short of performing some sort of build system conversion.

I like your manifest idea, in that it sounds like you're creating commands to help me build a workspace file. I would require that I can put that workspace file anywhere. I would put it among the source and expect that the local-cache I create from it could be out-of-source.

I considered making a third-party tool, but that would require we expose more hooks from conan. In order to make the super project script, you'd need to get at the package_info.

@jgsogo

It is clear that the existing names carry all the baggage of Conan 1.X. We need to strip away the baggage and talk about these from first principles. That means we either need to redefine these terms or invent new terms.

To critique the Conan 1.x notion of these labels:

Do you think it is possible for us to back up and imagine a world without a cache?

The orchestration can be delegated in another tool like CMake with custom-commands or it can be Conan itself running commands in order (always the same build-folder, always the same package-folder, without removing files). I see no advantage in using CMake to call other build-systems while we can use Conan itself...

This is a good comment. The reason to generate a super project with CMake or similar is because you can do package-level dependencies and incremental builds. We don't want to make Conan a build system, like you said. It is a build system orchestrator.

DoDoENT commented 3 years ago

Perhaps workspaces can be rehabilitated for you? Would it work with your use case if it wasn't so cmake-centric?

Out build system is fully cmake-centric, so cmake-centric workspaces would work for me. But I think making them cmake-centric is not a great idea because some people may want to use them without using CMake.

My main issue with your proposal (as far as I understood) is that it expects that all packages from the co-development group are in the same code repository. Correct me if I'm wrong here. This is why I proposed the manifest idea, which would let me create a workspace out of different packages located in completely unrelated folders on my system.

Then, for my use case, I would simply create a temporary workspace manifest grouping packages that I want to co-develop and after I'm done I'd ditch the manifest. In my case, the manifest would contain absolute paths to conanfiles of my packages in the co-development group.

For your use case, as far as I understand, you would create a permanent manifest with relative paths to your conanfiles in the same code repository and commit the manifest file into the SCM. I might do the same if I ever have a package group that always needs to be developed together (most of my packages are not like that).

I would require that I can put that workspace file anywhere. I would put it among the source and expect that the local-cache I create from it could be out-of-source.

Yeah, that's basically what I'm suggesting. For example, conan install /path/to/workspace/manifest --cache-dir=/path/to/project-cache would create a new local-cache for the package group defined in the workspace manifest file.

kenfred commented 3 years ago

Out build system is fully cmake-centric, so cmake-centric workspaces would work for me. But I think making them cmake-centric is not a great idea because some people may want to use them without using CMake.

Agree completely. I'm sorry that I didn't not realize that "workspaces" had that cmake/single-build-system assumption baked in. You and @jgsogo seem to be aligned on what "workspaces" means, but I was thinking of them differently. I want a co-development orchestration file. Bonus points if it can be used to generate a package level super build.

My main issue with your proposal (as far as I understood) is that it expects that all packages from the co-development group are in the same code repository. Correct me if I'm wrong here.

Not quite. If we can, I'd like to support my pet use-case of being able to check in the manifest (or in some way enable a consumer tree). But remove that feature and I would still advocate for a local folder, where I create a workspace file/manifest/orchestration recipe/whatever, in which to build and package a group of co-developed packages. Your local-cache satisfies this, I believe. My issue with it is it further complicates conan's cache model and doubles-down on the overuse of the cache concept.

Then, for my use case, I would simply create a temporary workspace manifest grouping packages that I want to co-develop and after I'm done I'd ditch the manifest. In my case, the manifest would contain absolute paths to conanfiles of my packages in the co-development group.

Yes. I agree. This would be the primary case. The manifest is temporary, only needed while you're actively co-developing. It's a secondary (less important) case to be able to keep that manifest around, even check it in, to enable longer-term co-development or this "consumer tree." I think we can support the secondary use case without compromising the primary.

For your use case, as far as I understand, you would create a permanent manifest with relative paths to your conanfiles in the same code repository and commit the manifest file into the SCM. I might do the same if I ever have a package group that always needs to be developed together (most of my packages are not like that).

I would require that I can put that workspace file anywhere. I would put it among the source and expect that the local-cache I create from it could be out-of-source.

Yeah, that's basically what I'm suggesting. For example, conan install /path/to/workspace/manifest --cache-dir=/path/to/project-cache would create a new local-cache for the package group defined in the workspace manifest file.

Yes. I think we're understanding each other.

jgsogo commented 3 years ago
  • die a swift and painful death and burn in hell for all eternity

Hey! I was involved in their conceptualization and implementation. Respect my creature! 🤣. 🤣 🤣


The Conan cache is not just a cache, I agree. But maybe we are wrong with the name and not with the usage. It is like the local Git repository, it is needed for Git to work. Conan needs that cache (let's call it staging directory) to work. Probably we are abusing the name of that folder, but not the folder itself. Conan packages go first to the staging area and then they are pushed to a remote. We can imagine Conan and Git using only the remote server, but it is very unlikely to happen.

I like this from @DoDoENT : conan install /path/to/workspace/manifest --cache-dir=/path/to/project-cache

@kenfred , workspaces AND co-development (or package-level super project), both are things we need to keep in mind when designing the changes for Conan v2.0. We will be ambitious, the new foundations need to support the development of both scenarios, both of them are really valuable for different groups of users. All these ideas and contributions are very valuable. Thanks!

kenfred commented 3 years ago

:) Sorry @jgsogo. I'm endlessly grateful for your efforts. Your baby is beautiful. Now kill it, please!!!!

Conan needs that cache (let's call it staging directory) to work

I'd like to probe this further because I'm not sure it's true. The conan cache is unique from a conan remote in that it will build from source packages that it doesn't have. Other than that, they are conceptually the same. That is one responsibility.

The separate responsibility you mention is the staging area for packages you are developing. We agree we need a staging area. I think me and @DoDoENT are saying that staging area should be a temporary, local folder. Not a global thing. This should not be married with the cache.

I think believe the conan install /path/to/workspace/manifest --cache-dir=/path/to/project-cache idea would satisfy my needs.

jgsogo commented 3 years ago

Yes, that's the main purpose of the cache: to build the packages in a controlled environment so we can guarantee that builds are reproducible. Probably not the best name, but as a Conan developer we use the term 100s times every day and we cannot use a different name 🤷

I think me and @DoDoENT are saying that staging area should be a temporary, local folder. Not a global thing. This should not be married with the cache.

Add me to the team! When I say local cache, this is precisely what I mean (I wrote /project/.conan/<cache>). 🎉 It is a local directory inside your project folder or working directory... but from the implementation point of view it would be great if it behaves exactly as the global cache: a place that Conan will use to build the binaries and find local packages (after the calls to package() in that orchestrated process).

kenfred commented 3 years ago

Yes, that's the main purpose of the cache: to build the packages in a controlled environment so we can guarantee that builds are reproducible.

Exactly! The cache is the place for released, versioned and revisioned, immutable packages that we rely on to make our builds reproducible.

Now I ask you, if that is its purpose, does a temporary, in-development/mutating thing belong there? What about something you're staging to consider for publishing, but is not yet released and could still change? I'd say no! In both counts, this violates the "main purpose" of the cache.

Probably not the best name, but as a Conan developer we use the term 100s times every day and we cannot use a different name

The issue is not the name, it's the overloaded responsibilities and the concept that it is the center of the conan universe. "Cache" is appropriate for the main purpose you expressed.

When I say local cache, this is precisely what I mean (I wrote /project/.conan/).

Oh, I missed that you put that in local project scope. Sorry. Then I think we are in agreement about that! The only sticking point is the nomenclature of "cache" and how it implies a strong relation to the real cache, when its purpose is actually quite different and it has the same interaction with the real cache that a consumer would have.

kenfred commented 3 years ago

Maybe this will help:

If you agree with these definitions, then you can see that Package Development has nothing to do with the cache. It needs a remote or a cache from which to get dependencies, but the fact that it is evolving means it needs to stay out of the cache until it's published. You can also see that Package Development is almost identical to Consumer Development. They are the same right up until the point you decide to publish. They belong to one phase and the cache/remote belong in the published phase.

memsharded commented 11 months ago

I think https://github.com/conan-io/conan/pull/14923 can be of great help in this. It allows saving packages from one cache and restoring them in another cache.