conan-io / conan

Conan - The open-source C and C++ package manager

https://conan.io

MIT License

8.17k stars 974 forks source link

[scm][revisions] Thoughts and questions on how to properly handle gitflow and Continuous Integration/Delivery with Conan recipes #3003

Open Adnn opened 6 years ago

Adnn commented 6 years ago

[x] I've read the CONTRIBUTING guide.
[x] I've specified the Conan version, operating system version and any tool that can be relevant.
[x] I've explained the steps to reproduce the error or the motivation/use case of the question/suggestion.

Hello, Progressing with the integration of Conan in our organization, we are getting close to make our life better thanks to Conan!

The prologue

Here is a brief description of our status and the ideal of what we plan to achieve:

We develop C++ software through a modular approach. Modularity is achieved by defining many CMake targets, that can live in separate or common git repositories.
We rely on external dependencies, that we may compile ourselves, or use prebuilt-binaries
We adhere to Gitflow, and ideally want to continuously build and test our develop branches of the different repositories, and continuously deliver the master branches [which may mean update an hosted web service, or build an archive to distribute to customers that will deploy our application/library on their environment]

Continous Integration of the develop branches

On our CI/CD machine: ideally, each time a new commit appears on develop, the machine pulls it, then builds it with conan install ${path_to_repo_recipe}

With that, we know for sure the recipe of the package is up to date (it is not fetched from any Conan server, but taken directly within the source tree).

A difficulty arises with the dependencies on other internal projects listed in this recipe (those other internal projects also following Gitflow): Assuming App depends on Lib, we may see something like that in App recipe:

class AppConan(ConanFile):
    name = "App"
    requires = (("lib/0.0.0@com/develop"),
               )
    ...

Yet, Lib should probably be taken at a specific commit, in order for it's interface to match the expectations of App.

We thought of two ways of getting there:

A) Making the commit hash a recipe option

This option would be used in the build() function of the recipe, something like that:

def build(self):
    ...
    self.run("git checkout {commit_id}".format(commit_id=self.options.gitcommit))
    ...

As options are hashed into the package_id, we know a new Lib binary is built each time the gitcommit value is changed for Lib requirement in the App recipe.

Drawback

The problem we can foresee here is: how to handle changes in Lib recipe itself?

There is a single recipe: lib/0.0.0@com/develop built with different commit options. If a new version of the recipe is uploaded, it would overwrite the previous one, and one could builld a "recipe1" commit with the new "recipe2", potentially breaking.

B) Adding the commit hash in the recipe version

This way, we upload a new recipe to the Conan server for each commit of Lib. The recipe name could be something like lib/0.0.0-7ae81b1@com/develop

This solution addresses the problem of solution A), since each commit has its recipe published, the commit is always built with the corresponding recipe.

Drawback

A potential complication here is polluting the Conan server with many recipes for a single repository, as there are many commits on develop branch of Lib. This could make browsing packages in Artifactory UI complicated. Also, most of the recipes will be identical, except for a hardcoded commit id to checkout, since it is expected that the recipe will change, but not change often compared to the frequency of commits on develop.

The actual question

Are there recommendations, or best practices, to address this aspect of development?

Minimonium commented 6 years ago

Do you really need to depend on (and store) develop packages? I can only think of having one single lib/latest@com/develop recipe, which can be used to deploy review apps.

Adnn commented 6 years ago

@Minimonium Thank you for your interest!

I may be missing your point though, I will go into a bit of details of our workflow: As we develop, we tend to have sparse releases (let's say every few months) with several sprints and features added in between releases. Following gitflow, we consider that develop is the branch meaning "Everything pushed there is deemed usable by all developers collaborating on a project involving this repo" So what happens is when a developer is tasked to add a new feature, they start atopic/xxxx branch, and when satisfied with the job, merge this branch into develop, meaning others can now use it.

Knowing that, we definitely want CI to automatically build/test develop commits for us, so I am not sure how to avoid having develop packages available for CI so it can build dependencies for downstream.

Do you have any suggestion on this point?

niosHD commented 6 years ago

I can not really comment on what is best practice here. However, as a reference, I opted for option B and use the scm revision (i.e. git sha, see #3052) directly as package version when building conan packages on the CI server. Additionally, to integrate nicely with gitflow, we use conan's package aliases to always have a pointer to the latests successfully built and tested conan package in each branch.

I can not provide any reference yet if this approach is too wasteful in terms of storage overhead on the conan server. However, I think that it should be manageable. In the worst case some cleanup script has to be used that regularly discards old (binary) packages.

lasote commented 6 years ago

@adnn About:

This could make browsing packages in Artifactory UI complicated.

Is it really an issue? do you explore by hand often the repositories? What is the goal?

Thanks

Adnn commented 6 years ago

@lasote thank you for your interest!

To be honest, this is the first time that we are setting up an artifact repository in our organisation ( \o/ ), so we don't really know yet what will become a usefull feature in our workflows, but since we realized the option existed, we wondered if that could be an issue to "spam" the artifact repositories with one recipe per commit.

Are you aware of any recommendations to address the general issue of CI for gitflow repositories relying on Conan recipes? Any preference for A) or B)?

lasote commented 6 years ago

Hi, I would like to have a recommendation, but I don't, yet.

We are currently starting to define what will be the package revisions. The general idea is to automatically keep (in server side) different revisions for the same recipe reference (and different binary packages revisions for each recipe revision). And the server will automatically retrieve the latest if no "revision" is specified from the client.

But, with the current state of conan, the @niosHD recommendation sounds good to achieve the needed reproducibility, definitely, the scm feature should help in your flows. (I already have seen the opened #3069 and will try it asap, but probably tomorrow).

In #3052 @niosHD very well explained how they are using the git-flow and using alias per branch to point to the latest commit of a recipe (branch based).

What I'm trying to guess is: when we get the revisions feature, do we really want/need to change the recipe version field with the commit and the alias pointing to the latest or is it not needed anymore?

The revisions feature will allow you to automatically fetch the latest revision from a server.
The revisions feature will save every single pushed change into a conan repository.
BUT the revisions feature (with current proposition) doesn't have an automatic translation between an SCM commit and a revision, nor have really a way to "rollback" to a previous revision (without extending current proposition with kind of 'lock files')

I would like to learn more about all of you about your workflows:

When you create a new version of a recipe (or your code) (in a branch, e.j develop) and you see that it is buggy, what do you do? following git-flow, probably you will fix it in the same branch by adding more commits, right? probably it is not that common to revert to a previous commit, correct? Or maybe yes?
So, Would it make sense (with revisions feature) to "rollback" a package revision? or just to create a new one?

Thanks for your time, I'm sure we can work all together to finally have the needed "recommendation" for your and many other workflows that people are using out there! :)

Adnn commented 6 years ago

Thank you @lasote for investing time and effort to address this issue! I am strongly interested to solve this problem too (and apologize for the lengthy response)

If I were to take a step back from the details, and think about our organization use case from a higher level. When we develop, we want to be able to reference another development package with a specific revision (or latest, even though latest is not really a use case for us at the moment, and we stick with explicit).

So, ideally, we'd say something like that in our recipe: requires = libraryA/com/develop[revision="commit-ish"] [I don't know if the recipe version number would be useful to add, we currently use 0.0.0 as padding in our develop recipes]

Using a git commit-ish would actually allow both explicit and latest use cases:

explicit: Reference a specific commit sha [what we do]
latest: Reference latest from any branch, by using the branch name (note that in this case, we would rely on the build log to save the exact commit hash, for reproducibility)

Regarding the questions you raised, I can try to answer to the best of my knowledge, even though I am not sure to know what is the current plan with package revision, or what would be the UI to use them.

What I'm trying to guess is: when we get the revisions feature, do we really want/need to change the recipe version field with the commit and the alias pointing to the latest or is it not needed anymore?

If the each revision can checkout a different commit, and there is a way in downstream's conanfile to request a specific recipe revision, then we would not need to bump the recipe version field (on the contrary, in our case we want to follow @memsharded usual recommendation to keep the version striclty synchronized to the product version - so until recipe revisions are available, we would probably use either a version of the compound form $productversion-$commit, or simply use the channel to store the commit hash in the package reference. Our need to store the commit hash in the package reference would probably vanish when revisions are made available). If I understood correctly, if revisions are sequential, latest could implicitly be the latest revision published, so no need to alias in this case.

When you create a new version of a recipe (or your code) (in a branch, e.j develop) and you see that it is buggy, what do you do? following git-flow, probably you will fix it in the same branch by adding more commits, right? probably it is not that common to revert to a previous commit, correct? Or maybe yes?`

It happens, rarely, that we had to rollback commits, or full feature branches. To rollback, we revert (creating a new git commit on top of the reverted commits), as we never rewrite public history (not reset --hard followed by push --force). We keep recipes in the repository they build (we track code and recipe the same way), so you are right that if a recipe is buggy, we fix it as a new commit in said repo.

* So, Would it make sense (with revisions feature) to "rollback" a package revision? or just to create a new one?

So here, it might depend what you mean by rollback : erase (reset --hard), or simply create a new one on top (revert). In our case, since we don't erase the commits that were reverted, we cannot afford to lose recipe revisions (the branch may have been reverted, but someone could still try to build it later), so your proposal to create a new revision on top of the reverted one seems to fit in our organisation case. Now, if someone was to rewrite public history, and you had commit-ish embedded in recipe revisions, then the recipe revisions of re-written commits would become dangling. Then, one could argue that it is out of Conan responsibilities to handle the stability of SCM references embedded in the recipes/recipe revisions...

Adnn commented 6 years ago

One more point, that could be of potential interest.

We may conceptually make the difference between two cases:

A new commit in a repository changes the code, but does not change the recipe [appart from potentially bumping the recipe to use said commit in the build() method, if it is stored in the recipe itself]
A new commit actually changes the recipe by changing the requirements, the methods implementation, the available options... [anything more than bumping the current repository commit]

It appears this distinction is actually a good thing to keep in mind, because it appeared to be a cause for confusion when reading a few discussions regarding CI in different issues (some were discussing updating the recipe, while other meant updating the code the recipe builds).

If you recall the first post in this issue, case 2. is what prevents our organization to simply use the commit-ish as a recipe option. (because it would put the burden on the developer to know--for each downstream required package--which recipe reference to use for which commit).

Linking back to:

So, ideally, we'd say something something like that in our recipe: requires = libraryA/com/develop[revision="git commit-ish"]

It seems ideal from our point of view, but would somehow imply that somewhere, Conan is able to know which recipe revision to use for a given git commit. (thus introducing coupling), but could drastically reduce the number of revisions to be stored (assuming the commit is a parameter of the recipe, and not hardcoded), because the frequency of 1. is usually far superior to the frequency of 2.

lasote commented 6 years ago

Thanks for the very well explained comments. I think package revisions should fit well with your workflows, but I still need some insights about:

When we develop, we want to be able to reference another development package with a specific revision (or latest, even though latest is not really a use case for us at the moment, and we stick with explicit).

What is the "process" that leads you to know/decide which development version do you want? You mentioned a key aspect:

It seems ideal from our point of view, but would somehow imply that somewhere, Conan is able to know which recipe revision to use for a given git commit.

We don't have yet thought in a mechanism to resolve a revision number (not sequientially but a hash, actually, but not the commit hash) from the commit hash. So, this is related with my previous comment, it is mandatory for us to understand how you choose the desired development version. Is there a green CI build involved in the decission? We know we would need to implement some "lockfile" mechanism to keep attached to specific revisions, but we need to see that everything fits.

but could drastically reduce the number of revisions to be stored (assuming the commit is a parameter of the recipe, and not hardcoded), because the frequency of 1. is usually far superior to the frequency of 2.

I totally understand and agree, but I think it shouldn't be a problem for Artifactory nor your flows to store a million of revisions, but we would love to know if you detect some other issue. Of course @niosHD feedback would be very appreciated too.

Adnn commented 6 years ago

Thank you for following up with that suggestion! We are looking forward to the package revisions and give them a try,

What is the "process" that leads you to know/decide which development version do you want?

It is quite simple and informal in our case (we are a small team). Basically, if one of our repository app depends on another of our repos lib, a teammate will implement the needed changes to lib and merge them to the develop branch (let's say merge commit ab12cd34). From here on, the developer of app could use said changes (for us, develop means "ready to be used by other developers"), so he would pull them in is local environment (not necessarily via Conan), and start to use the updated upstream.

Then, if everything works fine, app recipe would be updated to require the lib package building commit ab12cd34. Then this recipe update is commited and merged back into develop. (We actually have a trivial script to make sure that a recipe is building successfully, so we can check before publishing to the central git repo)

In a nutshell, we don't have a strict procedure in place: developers would pull develop commits from upstreams, and when they are satisfied with the state of upstream, the developers could update the recipe to match this commit of the required upstream. It would be quite important for us to be able to request the package based on the (short) commit hash of upstream, and not another hash generated by Conan. In the current approach, the git short hash is what we (intend to) use to make a unique reference for each package.

I think it shouldn't be a problem for Artifactory nor your flows to store a million of revisions, but we would love to know if you detect some other issue. Of course @niosHD feedback would be very appreciated too.

Very good then, if having thousands of revisions for a given package is not an issue, we don't see a problem!

lasote commented 6 years ago

Thanks again for your explanation. 🥇 Will take into account the need to correlating SCM information and revision somehow (probably using Artifactory search, but we will see!).

niosHD commented 6 years ago

Thank you @lasote for the ping, really an interesting thread!

To underline the previous discussion, I think @Adnn's and our requirements regarding package management as well as the general workflow are pretty similar. For us, also the possibility to depend on a package in a very specific scm revision is key since we have modules which have to be updated in lockstep (e.g., llvm + clang). The package aliases, on the other hand, are less important and only for developer convenience (e.g., install the latest CI build from a feature branch). Furthermore, they are used as default requirement on the CI server sometimes (e.g., continuously test an application against the latest CI tested develop version of an library with stable API).

To be perfectly honest, I actually really like using conan in our current way where the scm revision is used as version. It works already today and nicely integrates into our established git-based flow. Similar to git, branches and tags are simply a pointer to a certain scm revision and can easily be updated. As the result, conan behaves like a binary cache for a git project which is great to use.

Furthermore, I think that this usage is perfectly valid for conan, as version is a free text field. The only nit I have currently is that conan's scm module does not natively support it (see #3052). However, that only means that I have remove the version from my recipe and that I have to specify the full package reference (i.e., <name>/<scm revision>@<user>/<channel>) on the command line when I export/create the package instead of only <user>/<channel>.

For me, the central question in this discussion is what benefits do I get from package revisions in this context?

In our current flow, when the conanfile.py lives in the same repository as the source that gets packaged, I do not think that package revisions are needed at all to get reproducibility. Namely, any change in the recipe also leads to a change of the scm revision and therefore the version.

If the conanfile.py lives in a separated repository, on the other hand, then the package revision gets important since the recipe can have an independent "version". However, this scenario is exactly the same as for regular packages. Furthermore, using the scm revision as version delivers mostly reproducible results even in this context already. Only the exact packaging details are currently not tracked.

To summarize, I definitely agree that package revisions are related and an essential feature for conan which is needed to get reproducibility in the general case. Furthermore, I think that package revisions play nicely with the current approach and make it even better. However, why package revisions + fixed version (e.g., branch name or any other fixed string) should replace using the scm revision as version is not clear to me. What benefits does it provide and what am I missing?

lasote commented 6 years ago

Many thanks for your feedback!

What benefits does it provide and what am I missing?

That is a very good question.

Probably only to get rid of the different packages references (all the revisions have the same) and being able to resolve always the latest. It, in principle, sounds complex to "replace" the reference to the corresponding recipe for a different commit.

But in your case, for the LLVM stuff, you really need to pin the version/revision and, for the other use case, you would need to get the corresponding revision for a commit and apply it somehow, and with the alias, you are good not replacing every reference with the new commit version.

Probably, in terms of reproducibility, the alias is not the best idea, because if you change the alias, you lose where it was pointing at some time, but I'm not saying I have (yet) a better solution for this using the revisions proposal.

petermbauer commented 5 years ago

Very informative thread, thx! In addition to using the git hash for the package version i was thinking of using a short form of the git branch name for the channel part. This would offer more information for the package consumer like e.g. mylib/7ae81b1@ci/feature_815_memfix And with conan alias there could be an alias created for mylib/65cda74@ci/develop like mylib/HEAD@ci/develop or mylib/latest@ci/develop.