[question] How to use export-pkg with --ignore-dirty and the new scm mode in Conan v2.0

conan-io / conan

Conan - The open-source C and C++ package manager

https://conan.io

MIT License

8.1k stars 960 forks source link

[question] How to use export-pkg with --ignore-dirty and the new scm mode in Conan v2.0 #12125

Open Bearwolves opened 1 year ago

Bearwolves commented 1 year ago

With the new scm attribute (https://docs.conan.io/en/latest/migrating_to_2.0/recipes.html#the-scm-attribute), how do we export a package in our CONAN_USER_HOME which is created from a dirty git repository?

~/wp/conan-example-component$ conan export-pkg --force --ignore-dirty /home/xxx6xxx/wp/conan-example-component yyy/stable
Exporting package recipe
conan-example-component/0.5.0@yyy/stable: Calling export()
ERROR: conan-example-component/0.5.0@yyy/stable: Error in export() method, line 35
        scm_url, scm_commit = git.get_url_and_commit()
        ConanException: Repo is dirty, cannot capture url and commit: /home/xxx6xxx/wp/conan-example-component

While looking about the conan 1.52.0 code especially following lines https://github.com/conan-io/conan/blob/1.52.0/conan/tools/scm/git.py#L49-L57 we didn't see any way to replicate the old behavior. Is that wanted? If yes, can you point us to some documentation (PR in conan or conan tribe - I saw nothing in the tribe documentation) about it ?

Conan version used 1.52.0

Thanks a lot! :)

[X] I've read the CONTRIBUTING guide.

memsharded commented 1 year ago

Hi @Bearwolves

It seems in your case, if that package is designed to be export-pkg and not a regular create, you don't want the "full reproducibility" guarantee that the get_url_and_commit() provides. That helper is designed to be able to build from sources later, making sure the same source is used, so avoiding dirty.

As this is not the case, and it seems you only want to capture an "approximate" commit, and not concerned about being dirty, because you will not build from sources later, then you can use the other new primitives of the new Git helper, and you will have full control, being able to get the commit even if repo is dirty. Is this your use case?

Bearwolves commented 1 year ago

Hey @memsharded,

Our current workflow follows the legacy developer (install, build, package, export-pkg) flow. We provide as interface to our developers two very small small script wrapper:

build.sh => install, build, package
export.sh => export-pkg

We get used to the developer over the ci i.e. create workflow to easily access the build artifacts during development. We are working in an embedded system context, where we have to copy the application on an emulated or real system to test them. Also accessing the CMake Artifacts like the compile_commands.json file. I read again the new tribe design about the changes of the developer flow, https://github.com/conan-io/tribe/blob/main/design/019-local_development_flow.md and I've got the feeling to be part of following statement: https://github.com/conan-io/tribe/blob/main/design/019-local_development_flow.md?plain=1#L54-L56. In the first case, you don't want to commit each changes before testing the package method for example.

During development, we would like to allow playing around with dirty packages but we do not want to allow any dirty package to be uploaded to our repositories. This use case is covered by the conan upload check. To conan test the "final" packages locally, we anyhow exported them in our CONAN_USER_HOME.

Do you advice any other conan features to cover this use case? Developing packages and making them available early for own testing or test consumer package with an early version of it. We tried two years back the editable layout, but never brought it to life in our flow.

memsharded commented 1 year ago

Thanks, I understand a bit better.

The get_url_and_commit() is actually designed to allow developer flows doing "local" commits, that can be squashed or removed later with git commands. In those cases, it will simulate how it will work in the server, by setting up the "url" origin to the local repo. So it is not necessary to actually push the commit to the server.

But yes, I understand that if you need to do this very often, it is not super convenient. But it is also difficult to have both "relaxed" behavior allowing dirty repos and guaranteeing at the same time that this will not happen when running elsewhere (like in CI servers). Maybe you can add a user conf switch, but sounds a bit too much.

Maybe the issue is what you mention below about editables, and needing to do a lot of export-pkg to make it available for consumers, so regarding it:

We tried two years back the editable layout, but never brought it to life in our flow.

The editable now works one order of magnitude better, with the built-in layout() method, and the helpers like cmake_layout(). I'd recommend giving it a try.

puetzk commented 1 year ago

The editable now works one order of magnitude better, with the built-in layout() method, and the helpers like cmake_layout(). I'd recommend giving it a try.

Yeah, layout is the beginnings of something useful. It does work well when I'm doing simpler stuff, like header-only libs or sometimes single-platform/build==host stuff, but is still not actually much use in a multi-platform/cross-compiling kind of workflow:

One cannot specify where the build_folder is (probably not in my source checkout relative to the recipe), and conan editable add does not have a way to specify --build-folder. When I have a bunch of different build folders tied to different versions of a toolchain or sysroot, These are probably off in a folder structure related to that, not anywhere near the source checkout that the different builds are all sharing
As a corollary to 1, since I usually have multiple build folders - when there's no way to specify it, there's also no way to specify different build_folders for different package_ids.

Thus, I mostly ends back in the non-local workflow, where conan can find the build/package folders because it's in the cache, and reatedly use export-pkg to refresh what's actually in said cache. That's at least a quick incremental build (unlike conan create), probably done in my IDE. And expor-pkg does let you specify --build-folder, and it will reuse the conaninfo.txt/conanbuildinfo.txt to determine which package_id to update.

layout() has the right basic API to handle this (it can see conanfile.settings and conanfile.options, and set conanfile.folders). So it could pick different build_folders based on settings. But each recipe will be hard-coded to a specific implementation of layout, usually the cmake_layout helper, and that helper pretty simplistic. It's hardcoded to do it by just build_type, which is not enough if what you're actually doing is testing different architectures or differnet compilers, not just Release/Debug. So in general this workflow won't work with the recipes as-is. One could write a layout() method that did work, and patch each recipe to that before doing conan editable add; this is certainly possible (it's a dirty, editable folder), but pasing in a hack to the recipe and remembering to revert it is not very convenient.

What I usually wish for is for the recipe define conanfile.cpp.* in layout(), but have conanfile.folders come from some external database (where things were recorded back when I did the editable add. But that's not a thing that currently exists.

puetzk commented 1 year ago

But it is also difficult to have both "relaxed" behavior allowing dirty repos and guaranteeing at the same time that this will not happen when running elsewhere (like in CI servers).

Yeah, the main hassle caused by having to make a junk "wip" commit is when you have several changes that are eventually going to get factored into multple commits. Having to immediately turn around and git set --soft to get rid of it, will mess up the index if you were starting to draft which lines to stage, which is a hassle when iterating quickly. Plus the "wip" commit is sitting there to be accidentally pushed. Forcing a throwaway commit didn't make it the package any less dirty, just adds manual steps.

I don't actually mind there being a commit object, though. And in fact it would be kind of cool to be certain that anything in my conan cache has a corresponding hash. I just don't like this polluting branch the worktree's index and reflogs. So... maybe get_url_and_commit() could get even fancier, using the plumbing commands git write-tree/git commit-tree to write a commit hash for its own use, but never touching the worktree state or any symbolic refs?

That still wouldn't recover the (desirable) behavior of blocking conan upload of a dirty package, though. Conceptually it's not hard - if the remote is pointed at a local filesystem path, rather than a remote URL, that's not something you should ever upload since it won't work for anyone else (doesn't really matter if this is because it was new case for a dirty worktree, because there was a manual "wip" commit, or because it's the real commit but hasn't been git pushed to the remote yet. All 3 still made a package with a URL that shouldn't be uploaded since it won't work for anybody else. But since there's nothing to standardize the scm keys used in the conandata.yml (export() and source() have to agree, but can use anything they agree on) I don't know how conan upload could know what to verify.

So in that sense, it would be better to just keep dirty stuff out of the cache in the first place (by figuring out how to make editable work)

puetzk commented 1 year ago

As a corollary to 1, since I usually have multiple build folders - when there's no way to specify it, there's also no way to specify different build_folders for different package_ids.

Hmm, thinking even further on this, I had a new-to-me idea. The other problem I often struggle with is that editable is too much a global solution to a local problem - usually I only want one (or a few) things that I am actively testing to use the editable package, and I don't want this leaking if I have to context-switch to some other project for a bit. Which usually leads to me inventing a bogus ref/version like package/editable@username/editable, and using --require-override to force things to use it (or making a change to the conanfile.py I never intend to commit, thus introducing a dirty working copy that get_url_and_commit will no longer allow to be tested...)

But I just had another thought; maybe something that would work well is coupling this with lockfiles? i.e. some kind of conan lock editable analagous to conan lock add where one could take a lockfile specifying only the base refs, and (rather than filling them out with a package_id and prev, instead place a ref into editable mode using a locked folders.source and folders.build, (which layout() would leave alone).

On the one hand editable is kind of antithetical to the idea of strict reproducibility you usually use lock to achieve. But it otherwise fits right in with the kind of overriding references that lockfiles exist to do, and has a place to store the chosen build/source folders. And evolving a not-yet built lockfile by choosing an editable build folder doesn't seem too much different to evolving it onto some package revision in the cache. It's at least more honest than just using export-pkg --force over and over to rewrite the contents of the cache. And conan could then be aware that there are dirty (editable) nodes in the dependency graph in order to prevent you from exporting/uploading the results, while still being able to do things like run a test_package against it.

puetzk commented 1 year ago

Apologies @Bearwolves for the wall of text. I started because sounds like you have similar workflows that are impacted by losing the ability to make a dirty conan create, but when I started thinking out loud I had more to write than I originally thought :-)

memsharded commented 1 year ago

Hi @puetzk

I think there are too many things here, probably we need to go step by step, and also a bit of deviation of the initial thread, might be worth to open new issues.

As a corollary to 1, since I usually have multiple build folders - when there's no way to specify it, there's also no way to specify different build_folders for different package_ids.

But each recipe will be hard-coded to a specific implementation of layout, usually the cmake_layout helper, and that helper pretty simplistic. It's hardcoded to do it by just build_type, which is not enough if what you're actually doing is testing different architectures or differnet compilers, not just Release/Debug

I am not sure about this, there is actually a new tools.cmake.cmake_layout:build_folder_vars for cmake_layout(), but it could be exploited by other build systems as well, that allows defining which folder structure is created for variables like the settings, options, etc. So you can perfectly have a build folder that is like linux/armv8/release/shared to host a build different to other native build that could be linux/x86_64/debug/static. It doesn't map 1:1 to package_ids, but it should be good for developer flows, because it is much easier to understand and read.

puetzk commented 1 year ago

Yeah, sorry about that. I was starting with a "me too" for similar workflows, then started running from your comments about using editable instead.

But only the idea about maybe using write-tree/commit-tree is directly applicable to get_url_and_commit(). This issue isn't the place to try and make editable more powerful.

there is actually a new tools.cmake.cmake_layout:build_folder_vars

Not sure how I overlooked that, but you're right. That would help... I'll look again.

shreyasr89 commented 11 months ago

@memsharded What about the local use cases a developer goes through on a day to day basis? He would want to change something, run conan and then commit. How is this thought of in Conan v2??

memsharded commented 11 months ago

Conan 2.0 developer experience is intended to be more native. For most use cases conan install + cmake ... + develop in your IDE should be more than enough. When exporting a package to the cache, it is intended just for the very last mile, and the intention is that the package there is fully reproducible and "final". That means for scm flows that require a clean repo, things should be committed before doing the export-pkg or create. If things fails, it is not an issue, the commit is still local, the developer can squash it until it works. But still, developers shouldn't need to do export-pkg or create often with uncommitted code.

For cases when modifying simultaneously more than 1 package, the editable flow has been improved a lot with layout() and it is recommended. That doesn't require export-pkg at all.