conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.14k stars 970 forks source link

Package revisions #798

Closed annulen closed 5 years ago

annulen commented 7 years ago

Most package managers have a concept of package revision, i.e. additional version number that reflects changes in packaging scripts or applied patches when "main" version number of packaged software remains the same.

It would be great if Conan added support for revisions too. This will make package updates more trasparent ("updated from vX.Y.Z-r1 to vX.Y.Z-r2"). Also there could be a policy that "stable" channel can never change conanfile and binaries without bumping revision, to prevent accidental changes in packages used in CI with manifest verification.

It would be great if it was possible to keep binary packages for previous revisions so that CI system with manifests checking does not get broken in case new revision is uploaded without committing new reference manifests.

It was previously briefly discussed at https://github.com/conan-io/conan/issues/480#issuecomment-247545547

memsharded commented 7 years ago

I think I agree with the goal of this issue, but please let me ask one question: Would you like the version to contain the revision? Like what you said X.Y.Z-r1? So it has to be referenced that way? I guess no, but just in case

Ensuring the stable channel cannot be overwritten might be opt-in configurable, we don't want to break existing workflows. We'll try to ask a few more users for feedback, while this feature could be useful, it is very important not to break anything badly.

annulen commented 7 years ago

Would you like the version to contain the revision? Like what you said X.Y.Z-r1?

In dependencies list it should be possible to use either X.Y.Z to get latest revision, or X.Y.Z-r1 to get fixed revision. Feel free to omit r letter btw.

Ensuring the stable channel cannot be overwritten might be opt-in configurable

Yes, I think it would be the most convenient option, but I was afraid it would complicate server code and web UI. You can make it configurable per server instance also. For main server, I think it would be reasonable to apply this policy for all "stable" channels, but it's up to you t decide.

memsharded commented 7 years ago

In dependencies list it should be possible to use either X.Y.Z to get latest revision, or X.Y.Z-r1 to get fixed revision. Feel free to omit r letter btw.

I think this should be doable with the new version ranges. We might have to extend the notation of the ranges, but could be a reasonable approach, so basically:

Pkg/[2.3.4]@user/channel

Would get latest revision, and using Pkg/[2.3.4-r4]@user/channel or just Pkg/2.3.4-r4@user/channel would get the exact r4 revision.

annulen commented 7 years ago

But Pkg/2.3.4@user/channel needs to work for backward compatibility

memsharded commented 7 years ago

What do you mean that Pkg/2.3.4@user/channel needs to work for backwards compatibility? As I understand backward compatibility:

Is that what you meant? Thanks!

annulen commented 7 years ago

I mean that Pkg/2.3.4@user/channel should resolve to the latest available revision, this was no existing package will break in case we enforce revision increment policy on stable branches

memsharded commented 7 years ago

I think that yes, it can break. Even if you force the revision increment on stable branches, automatically changing users the package they are depending on, without them doing it explicitly, is breaking. We don't do it even for overwritten packages, you have to explicitly use --update if you want your locally cached package to be updated. Users that depend on the first (without) revision of a package, would have to maintain their behavior, depending on that exact version.

Even if the package creators starts to publish new revisions, updating the consumers on the revisions without them noticing, doesn't sound like the expected approach.

Please, also note that enforcing revision increment in stable channels in conan.io might be difficult in the short term, or at least very controversial. We implemented the package overwriting feature, because it was a very requested feature, and many users are using it. Enforcing (user configurable) that on conan_servers, surely can be done easily, but conan.io is a different story. We have tried from the beginning to be as less opinionated as possible, letting users (and very important, package creators) do almost whatever they want to do. We are not changing this unless there is a very broad consensus that this should be done.

rconde01 commented 7 years ago

I don't currently have anything to add to the design discussion - but this functionality would be useful for my group.

piponazo commented 7 years ago

The same over here. It would be super nice to have this feature. In our project, some of the libraries have a dependency tree with a depth or 3 or 4 levels ... It is a bit annoying to have to touch all the depedencies to update the requirements. And most of the times we do not update the libraries but we make small revisions in the recipes.

memsharded commented 7 years ago

We will be reviewing the model for developing packages in next 0.23, in https://github.com/conan-io/conan/issues/1171, but also, the big picture of how packages are developed and evolve will be reviewed, so we will take this point into account.

Still don't know how to address simultaneously addressable content, compatible package binary hashes, updates to the latest revision. So to make sure, the problem we are trying to solve:

Note that the latest requisite might imply doing server calls, that can be slow, even for installed packages. This doesn't scale for large projects (if you have hundreds of packages, doing a conan install, even if everything is installed would go from a few seconds to more than one minute).

Overall it seems a very challenging thing, but lets analyze it again a bit further.

memsharded commented 7 years ago

I am starting to review this issue.

I would need some feedback/help. I am looking for some references of other package managers using such concept of revisions, but so far, didn't find them. Do you have some pointers to the revision concept in other package managers?

I think that a revision is just a versioning thing. So if we have a Pkg/0.1@user/channel, creating new "revisions" of the package, if we want to make them addressable, they would be something like:

Pkg/0.1#1@user/channel
Pkg/0.1#2@user/channel
...
Pkg/0.1#N@user/channel

I am using # as an indicator of the revision, but could be any other thing. In my first analysis, I don't see the revision as something "internal", that could be hidden from the package addressable reference.

If the revision number is included in the version, then, using revisions could be just a matter of tuning the version ranges expressions, and consuming it could be just:

Pkg/[0.1]@user/channel  #get latest revision
Pkg/0.1#3@user/channel #get exact revision
himikof commented 7 years ago

Here are some references to this concept in other package management systems:

FPM multi-packager calls this concept "iteration".

memsharded commented 7 years ago

Thanks @himikof for such useful links. I have been reviewing them (to be honest, I had looked mostly to language package managers for developers, which is the context of conan, not that much into system package managers), and I think there are good insights. First lets summarize the basics:

Regarding your references:

So, in my opinion, the above suggested approach is quite aligned with the findings:

This is why I was suggesting to be this feature an opt-in, users wanting to use package revisions could use some form of requires = Pkg/[0.1]@user/channel, and version-range resolution will handle accordingly revision numbers.

Feedback very welcome.

sztomi commented 7 years ago

I propose the following format the package reference: pkg/0.1.3@user/channel[0.24] or pkg/0.1.3@user/channel-0.24 where 0.24 is package revision. The rationale for this format (or any similar variation) is that the part before @ remains the name and version of the packaged item and the part after the @ remains "package metadata". This is easy to understand and document, follows the present setup and keeps the most important part at the beginning of the package reference, and can be opt-in easily.

memsharded commented 7 years ago

@sztomi, yes I like the idea of associating the package revision to the channel, and the "package metadata" idea. I have to check and think how this would be processed regarding the revision resolution, it should be similar to the version ranges logic, but applied to the channel. Still think that it should be opt-in, I am not sure you want to check for latest revision for all dependencies, regarding they have revisions or not, and being much slower due to extra network calls.

sztomi commented 7 years ago

Still think that it should be opt-in, I am not sure you want to check for latest revision for all dependencies

Agreed. If it's not specified, the latest revision should be used automatically.

memsharded commented 7 years ago

Agreed. If it's not specified, the latest revision should be used automatically.

But that is the issue :( If we want to use the latest revision automatically, you need to opt-in using version ranges:

If we make the former to automatically use revisions, we are forced to do a search for every single package in the dependency graph, which is an expensive operation, and will be much slower than the current install approach.

annulen commented 7 years ago

But Pkg/1.1@user/channel which does not use latest revision is useless, in this case it should be disallowed for packages that use revisions. And if we want all packages to be revision-enabled, for the sake of sanity, this means that everyone will have to use explicit revision numbers. That's fine with me, but I'm not sure everyone will be happy about it

sztomi commented 7 years ago

I'm not sure I understand why it would make installation slow. Earlier you wrote:

Range-version resolution requires more API calls to servers, which are slow.

I'm probably missing something, but the way I see it, it should be possible with a single call, shouldn't it? And this call should be the one that gets a package identified by a package reference.

(1) "Give me the package that matches this reference: Pkg/1.1@user/channel" or

(2) "Give me the package that matches this reference: Pkg/1.1@user/channel-2.3".

In the (1) case, the server can look at the repository and see if there are multiple revisions. Since there was no revision specified in the package reference, it returns the latest revision of the package. In case (2) it can select revision 2.3 and return that.

memsharded commented 7 years ago

For version ranges, you need to search the server, for all recipes matching the pattern, then do the actual download. That is much, much slower than just doing the actual, direct download, not only because there are more network calls, but also it requires a search in the server, that even if optimized, is slow. And if it has to be done for dependency graphs with up to hundreds of packages, by default, some people are going to complain because of the performance.

kdsx commented 7 years ago

Does conan perform a separate request to find each package? Maybe server could provide a request which allows to find multiple packages at once and also resolve dependencies? However this task may be non-trivial in case when dependencies are located on several remotes. Another possible solution is to retrieve full package index from the remotes (like many Linux distros do) and then decide what to download locally.

memsharded commented 7 years ago

In the normal case it doesn't perform a separate request to find each package, right now, it directly fetches the package. When version-ranges are involved, yes, a request is issued per Pkg/*@user/channel to find all available versions for that pattern in the remote. Such call will be done to each remote in the client configured remotes, in order.

Retrieving the full package index would be even more costly, specially in big servers (like conan.io or bintray), and it will be a caching problem, you know "there are two hard problems in CS: naming things and invalidating caches". This would render the revisions unusable, they are something intended to be fast and agile to update, so the cached full index would become obsolete, and you cannot be retrieving it again and again from all the remotes, that would kill performance for users and also saturate the servers.

We are working on it, currently considering different approaches:

They both have pros & cons, so we are working on it, trying to move forward, but this is a much, much harder problem than it seems at first sight, and there are different trade-offs to take into account. Thanks very much for your feedback!

kdsx commented 7 years ago

Forgot to notice. When I wrote about full package index retrieving I meant something like git remote update, and not apt update. In other words this could be done incrementally. E.g. server could store a simplified changelog (simplified means without unnecessary log record like "package A was added", "package A was removed"), and client could store a "working state" + its revision. Then to update the index client should only download only a little part of the changelog, apply it to the current state and update the revision. Even for huge servers it's hard to imagine that such request may be very slow.

Also what do you think about a single request to resolve multiple dependencies? In most "good cases" this approach should require 1-2 requests, while in the worst case require almost the same amount of requests as conan does now, but "worst cases" may appear only when a user is doing something really strange and wrong.

memsharded commented 7 years ago

The incremental changelog is not simple: the server might need to maintain a incremental changelog for each different client, otherwise, you end up with a huge changelog, and it loses its utility. And having an incremental changelog for each client is almost impossible, taking into account that most of the transactions with clients are totally anonymous (and even that, they do not depend on logged user, but on running client instance, also different per different CONAN_USER_HOME the machine might be running, which could be very large for CI servers)

It is not so simple to retrieve multiple dependencies at once. The dependency graph has to be incrementally evaluated, there are conditional requirements, so the process is to retrieve one dependency, check and evaluate its requirements, get them, and so on. It might improve, for sure, when some of them can be retrieved in parallel, but it depends on the graph depth and breadth. However, even if some network calls can be saved, search for each package still requires a search query in the server, that it is still very slow. Even if we try to use fast approaches, it is still a pattern search, other of magnitude slower than direct fetching. Not possible to implement as the default, that is why considering opt-in or other approaches.

kdsx commented 7 years ago

Hmm, I didn't get why the changelog should be stored for each client. This changelog is the same for every client and every client knows its revision, so it knows which transactions are missing locally. About the changelog size, the most important word is simplified (in my company we call it "compressed"). We use this approach for synching data between servers and so far it works :) The main idea of such changelog is "do not store records which do not have an impact on the final result". So finally such log must contain exactly the same number of records as number of binary packages in the repository. After the first updates check all consequent checks should be much faster because the amount of data will be reduced significantly.

memsharded commented 7 years ago

Ok, yes, an indexed changelog in the server, where clients can send the latest index and retrieve just the new part is possible. However, still not simple, and costly: it requires a DB on the server, with linear-time retrieval of latest indexes. And some servers like conan.io are already large, serving up to a hundred thousands of packages per month, and growing. And there are many, many, many queries from CI machines (maybe even more than from developers machines), that just fire a new clean environment all the time, so those will require the full index every time too. Not sure either how to handle package removals in the changelog. And, also, doing server side changes is very difficult, take into account that conan now has support in conan_server, conan.io, Artifactory and Bintray. Seems very overkill, adding complexity on the servers should be avoided at all costs, migrations are hard, and bugs take many time to be fixed and distributed. It doesn't scale at all, and we have to consider the community of users as a whole. All the time that we would be developing and maintaining those things, is time we cannot use to develop other more important features or providing support.

Still, to make this issue clear. Having package-revisions is something that can be done today, just by two steps:

We are considering ways to improve this, but they should keep a reasonable cost/value ratio. Lets keep working on it. Thanks very much for your help and feedback :)

kdsx commented 7 years ago

Sure. Finally when a command feels that things are getting slow it could decide to use only its private server which would contain only needed packages.

BTW one important concept mentioned in the beginning of the thread wasn't discussed much. Is it planned to implement protection against uploaded package overwriting?

memsharded commented 7 years ago

Sure. Finally when a command feels that things are getting slow it could decide to use only its private server which would contain only needed packages.

Yes, but that is precisely the main issue of the #1373 . He is already using a private server, but hosting so many packages that it becomes painfully slow.

Regarding the non overwrite, there are related issues in #679 and #1381.

memsharded commented 7 years ago

Ok, I am trying to move forward this issue from the very roots of the problem:

So I am considering the concept illustrated in my branch: https://github.com/conan-io/conan/compare/develop...memsharded:feature/conan_links?expand=1

This kind of package can be manually edited and uploaded, but further automation could be done:

Please feedback. cc/ @claasd @lasote

annulen commented 7 years ago

package creators might easily revert back to an older revision without removing the newer ones

This should be prohibited on the public server, otherwise users who downloaded higher revision (but don't specify exact revision number) won't be upgraded to good revision. Instead, previsous revision should be published with higher revision number.

memsharded commented 7 years ago

This should be prohibited on the public server, otherwise users who downloaded higher revision (but don't specify exact revision number) won't be upgraded to good revision. Instead, previsous revision should be published with higher revision number.

I still don't get the willing to restrict users, specially package creators. Users will be retrieved and updated with the version that the package creator wants them to be updated to. Lets say that you have published Pkg/1.2.3, which is proxied by Pkg/1.2. Then you upload Pkg/1.2.4, and update Pkg/1.2 to point to it, but you definitely screw 1.2.4 with a serious security bug that you don't know how to fix quickly. Would you be forced to create a Pkg/1.2.5, which will be identical to Pkg/1.2.3? Doesn't seem very logical, sounds confusing and a waste of resources. Instead you can just update Pkg/1.2 to point to Pkg/1.2.3 again, which is an almost instantaneous fix.

annulen commented 7 years ago

That's not a restriction, just common sense. Versions and revisions need to be monotonic in time, otherwise they don't make much more sense than e.g. hash sums of binaries.

annulen commented 7 years ago

Lets say that you have published Pkg/1.2.3, which is proxied by Pkg/1.2. Then you upload Pkg/1.2.4, and update Pkg/1.2 to point to it, but you definitely screw 1.2.4 with a serious security bug that you don't know how to fix quickly. Would you be forced to create a Pkg/1.2.5, which will be identical to Pkg/1.2.3?

Wait, we are talking about Pkg/1.2-r3, Pkg/1.2-r4, and Pkg/1.2-r5, aren't we? Yes, this is the only meaningful decision to publish r3 as r5, if r4 is broken

As for software versions, they are completely separate topic from package revisions

annulen commented 7 years ago

And yes, if you released software with version "1.2.4" and it has security bug, you definitely need to release 1.2.5 immediately, even if it's otherwise identical to 1.2.3

lasote commented 7 years ago

@memsharded +1. Technically sounds good, and I think it can solve the problems with the package revisions, slow version ranges resolving and so on. @annulen yes, these are good practices, and we should add it to the docs, but only good practices, we can't and we won't control the versions scheme/flow management of the conan-center packages.

annulen commented 7 years ago

@lasote Versioning of original software is out of control, but it's not the topic here. Revision numbers are related to repsective Conan recipes only, and they can and should be controlled

lasote commented 7 years ago

@annulen Without entering into the debate of how hard we should control the user's packages, we don't have the resources to do it, so it's not an open discussion today.

annulen commented 7 years ago

You say that conan-center is "curated" repository, how can it be called so if it doesn't even have monotonic revision numbers?

memsharded commented 7 years ago

It doesn't have revision numbers at all, not even monotonic.

annulen commented 7 years ago

It doesn't have them yet, which is the point of this issue

lasote commented 7 years ago

This is a general discussion about package revisions, not centered in the conan-center repository. You can have your own practices in an on-premises conan server or Artifactory. And of course, it's interesting to find a good solution. You can also have package revisions in conan-center, of course.

claasd commented 7 years ago

Hi, I like the solution @memsharded. It technically solves our issues. (espacially if we get a conan link command)

@annulen: We basically already use package revisions, by adding the build number of our CI server to the version number (our packages look like MyPkg/1.0.0-alpha-build.11. To get the latest build, we use version ranges (MyPkg/[~1.0]@user/testing). Thus, we have a lot of different version for each package on the server (between 70k and 100k in total). Then, resolving the revision is very slow. This kind of proxy package solves our speed problem. furthermore, I get more control over wich package will be selected.

As for the discussion about prohibiting and good practice, it is my strong believe that you should never enforce those constraints by technical means. Conan itself does not even enforce semantic versioning. We use it, and I encourage everyone to do so, but it may not fit everyone. We also use package references, and I would agree that if package 1.0.0-rc5 broke something, I need to release 1.0.0-rc6. But again, I would never technically enforce it.

Bottom line: I like it! When do I get it? :stuck_out_tongue_winking_eye:

sztomi commented 7 years ago

Yeah, I think this is pretty good and would also help with our use cases.

piponazo commented 7 years ago

@memsharded In principle I like the proposal but I have one question. You said "Maybe don't even need the version and name." Does it mean that the final consumer conanfile.py would look like this ?

class MyApp(ConanFile):
    settings = 'os', 'compiler', 'build_type', 'arch'
    generators = 'cmake'

    def requirements(self):
        self.requires('Lib1Proxy@user/stable')
        self.requires('Lib2Proxy@user/stable')
        ...

If possible I would prefer to have version in the Proxy as an optional field. I think it helps to have a global view of the versions you are using in a project (For example, to know if you are using Qt4 or Qt5). But I do not have a strong preference for that.

In case this approach is approved, what would be the steps to update the conan code of a project to use this approach ? Would we need to do a lot of work ?

memsharded commented 7 years ago

@piponazo No, the consumer requirement will look as always, nothing changes there.

I meant the "package-link" (btw, still need a good name to refer to this concept: link, proxy?) could be:

class TestConan(ConanFile):
    conan_link = "Hello/0.1@lasote/channel"

because it could be generated by something like:

$ conan package_link Hello/0.X Hello/0.1@user/channel

So the name and version are already explicit in the command, and can be used to put the package-link in the conan cache.

If possible I would prefer to have version in the Proxy as an optional field.

It doesn't matter. In your dependency graph, you will have the real version you are using. So you can make your App require Qt/latest@team/stable, and if you make a package-link between Qt/latest => Qt/5.0, in your dependency graph you will see "App" => "Qt/5.0"

memsharded commented 7 years ago

Conan has merged the conan alias functionality into develop, will be available for 0.25. This might be a very good core for implementing package revisions.

I have done some performance testing for the conan alias approach, and seemed very reasonable, like incrementing over 5% wrt directly retrieving packages (running with local Artifactory instance). Mainly because most of the cost comes from initiating conan, still have to check the performance for single-instance resolution.

I am moving the "revisions" feature to 0.26 at the moment, or until we get some feedback about the conan alias feature.

lasote commented 7 years ago

I've no new feedback about the conan alias command. So let's wait to 0.27.

mmatrosov commented 6 years ago

I would like to bring to attention a slightly different aspect of package revisions: build reproducibility. Basically, I don't want to use "latest" version anywhere. If I use it, this means, that results that I get depend on point in time when I did the build. But I want to make sure, that if anyone at any point in time check out our source code repository (which contains full list of references for packages it uses) on a given revision and build the solution, they will have exactly the same result, including exactly the same third-parties (even if third-parties used at this point had some bugs).

I believe aliases have nothing to do with this issue. We could add revision number into references, like this: mylib/1.2.3-r.2@user/channel. And this works fine for most of scenarios. The problem is that this brings semver into the domain of pre-release versions. First of all, this is not what we try to express here. This is the release version, but it also contains package revision, which is a different beast, but we put it here just because we don't have another place to put it.

Besides wrong intent, I can also point out on a particular problem we have with this approach. Imagine I have recipe for lib A, which wants to use lib B with version 1.2.x. But, according to our approach, versions of lib B are in the form 1.2.3-r1 or 1.2.4-r2. None of these match 1.2.x mask. I would like to have a mask like 1.2.x-r, so I can have a latest revision (treated as pre-release version by semver) for any patch version within a given major and minor version. But this mask is not a valid semver range.

So, the questions:

  1. For current version of conan, how do I specify valid version range with the meaning of "1.2.x-r", so that it matches the latest revision for latest patch version?
  2. For future versions of conan, are you considering a "proper" way to specify revisions in a reference, that does not bring version into pre-release domain?
mmatrosov commented 6 years ago

@memsharded any ideas on the questions?..

sztomi commented 6 years ago

Besides wrong intent, I can also point out on a particular problem we have with this approach. Imagine I have recipe for lib A, which wants to use lib B with version 1.2.x. But, according to our approach, versions of lib B are in the form 1.2.3-r1 or 1.2.4-r2. None of these match 1.2.x mask.

We have implemented revisions very similarly, only without the r before the revision number. We have a script that updates revisions in dependent packages recursively (because updating a revision changes its contents, so we up the revision all the way to leaf packages. This works very well in practice and avoids the need for version ranges. As a result, we completely avoid the issue you raise here: we always use exact versions-revisions everywhere. We'll probably never use ranges or aliases precisely because we want build reproducibility. I guess we could get away with collecting and storing the resolved version numbers at the time of build, but that detaches the information from version control.