Closed annulen closed 5 years ago
I think I agree with the goal of this issue, but please let me ask one question: Would you like the version to contain the revision? Like what you said X.Y.Z-r1? So it has to be referenced that way? I guess no, but just in case
Ensuring the stable channel cannot be overwritten might be opt-in configurable, we don't want to break existing workflows. We'll try to ask a few more users for feedback, while this feature could be useful, it is very important not to break anything badly.
Would you like the version to contain the revision? Like what you said X.Y.Z-r1?
In dependencies list it should be possible to use either X.Y.Z to get latest revision, or X.Y.Z-r1 to get fixed revision. Feel free to omit r
letter btw.
Ensuring the stable channel cannot be overwritten might be opt-in configurable
Yes, I think it would be the most convenient option, but I was afraid it would complicate server code and web UI. You can make it configurable per server instance also. For main server, I think it would be reasonable to apply this policy for all "stable" channels, but it's up to you t decide.
In dependencies list it should be possible to use either X.Y.Z to get latest revision, or X.Y.Z-r1 to get fixed revision. Feel free to omit r letter btw.
I think this should be doable with the new version ranges. We might have to extend the notation of the ranges, but could be a reasonable approach, so basically:
Pkg/[2.3.4]@user/channel
Would get latest revision, and using Pkg/[2.3.4-r4]@user/channel
or just Pkg/2.3.4-r4@user/channel
would get the exact r4 revision.
But Pkg/2.3.4@user/channel
needs to work for backward compatibility
What do you mean that Pkg/2.3.4@user/channel
needs to work for backwards compatibility? As I understand backward compatibility:
Pkg/2.3.4@user/channel
, will still work as already defined, that will (and cannot change)2.3.4
needs to resolve always to the same, original (without revision) package that it was pointing to.Pkg/[2.3.4]@user/channel
or some similar syntax.Is that what you meant? Thanks!
I mean that Pkg/2.3.4@user/channel
should resolve to the latest available revision, this was no existing package will break in case we enforce revision increment policy on stable branches
I think that yes, it can break. Even if you force the revision increment on stable branches, automatically changing users the package they are depending on, without them doing it explicitly, is breaking. We don't do it even for overwritten packages, you have to explicitly use --update
if you want your locally cached package to be updated. Users that depend on the first (without) revision of a package, would have to maintain their behavior, depending on that exact version.
Even if the package creators starts to publish new revisions, updating the consumers on the revisions without them noticing, doesn't sound like the expected approach.
Please, also note that enforcing revision increment in stable channels in conan.io might be difficult in the short term, or at least very controversial. We implemented the package overwriting feature, because it was a very requested feature, and many users are using it. Enforcing (user configurable) that on conan_servers, surely can be done easily, but conan.io is a different story. We have tried from the beginning to be as less opinionated as possible, letting users (and very important, package creators) do almost whatever they want to do. We are not changing this unless there is a very broad consensus that this should be done.
I don't currently have anything to add to the design discussion - but this functionality would be useful for my group.
The same over here. It would be super nice to have this feature. In our project, some of the libraries have a dependency tree with a depth or 3 or 4 levels ... It is a bit annoying to have to touch all the depedencies to update the requirements. And most of the times we do not update the libraries but we make small revisions in the recipes.
We will be reviewing the model for developing packages in next 0.23, in https://github.com/conan-io/conan/issues/1171, but also, the big picture of how packages are developed and evolve will be reviewed, so we will take this point into account.
Still don't know how to address simultaneously addressable content, compatible package binary hashes, updates to the latest revision. So to make sure, the problem we are trying to solve:
Pkg/version@user/channel
Note that the latest requisite might imply doing server calls, that can be slow, even for installed packages. This doesn't scale for large projects (if you have hundreds of packages, doing a conan install, even if everything is installed would go from a few seconds to more than one minute).
Overall it seems a very challenging thing, but lets analyze it again a bit further.
I am starting to review this issue.
I would need some feedback/help. I am looking for some references of other package managers using such concept of revisions, but so far, didn't find them. Do you have some pointers to the revision concept in other package managers?
I think that a revision is just a versioning thing. So if we have a Pkg/0.1@user/channel
, creating new "revisions" of the package, if we want to make them addressable, they would be something like:
Pkg/0.1#1@user/channel
Pkg/0.1#2@user/channel
...
Pkg/0.1#N@user/channel
I am using # as an indicator of the revision, but could be any other thing. In my first analysis, I don't see the revision as something "internal", that could be hidden from the package addressable reference.
If the revision number is included in the version, then, using revisions could be just a matter of tuning the version ranges expressions, and consuming it could be just:
Pkg/[0.1]@user/channel #get latest revision
Pkg/0.1#3@user/channel #get exact revision
Here are some references to this concept in other package management systems:
PORTREVISION
(documented here)FPM multi-packager calls this concept "iteration".
Thanks @himikof for such useful links. I have been reviewing them (to be honest, I had looked mostly to language package managers for developers, which is the context of conan, not that much into system package managers), and I think there are good insights. First lets summarize the basics:
Regarding your references:
pkgName-version-revision
. So it is not something that can be fully transparent, revision numbers are included in the package references or filenames.So, in my opinion, the above suggested approach is quite aligned with the findings:
Pkg/0.1#1@user/channel
. The revision number can come from a revision=1
field, and conan could append it. But the package reference will include it, it cannot be "hidden" or transparent.requires
of the form Pkg/0.1@user/channel
, to get the latest revision, it could make things slower. Range-version resolution requires more API calls to servers, which are slow. There are users with up to hundreds of dependencies, and enabling this by default could highly increase install times, without any need. Very little we could optimize, the bottleneck is the network calls. This is why I was suggesting to be this feature an opt-in, users wanting to use package revisions could use some form of requires = Pkg/[0.1]@user/channel
, and version-range resolution will handle accordingly revision numbers.
Feedback very welcome.
I propose the following format the package reference: pkg/0.1.3@user/channel[0.24]
or pkg/0.1.3@user/channel-0.24
where 0.24
is package revision. The rationale for this format (or any similar variation) is that the part before @
remains the name and version of the packaged item and the part after the @
remains "package metadata". This is easy to understand and document, follows the present setup and keeps the most important part at the beginning of the package reference, and can be opt-in easily.
@sztomi, yes I like the idea of associating the package revision to the channel, and the "package metadata" idea. I have to check and think how this would be processed regarding the revision resolution, it should be similar to the version ranges logic, but applied to the channel. Still think that it should be opt-in, I am not sure you want to check for latest revision for all dependencies, regarding they have revisions or not, and being much slower due to extra network calls.
Still think that it should be opt-in, I am not sure you want to check for latest revision for all dependencies
Agreed. If it's not specified, the latest revision should be used automatically.
Agreed. If it's not specified, the latest revision should be used automatically.
But that is the issue :( If we want to use the latest revision automatically, you need to opt-in using version ranges:
Pkg/1.1@user/channel
will not use revisions. Will just look for the exact versionPkg/[1.1]@user/channel
will use latest revision.If we make the former to automatically use revisions, we are forced to do a search for every single package in the dependency graph, which is an expensive operation, and will be much slower than the current install approach.
But Pkg/1.1@user/channel
which does not use latest revision is useless, in this case it should be disallowed for packages that use revisions. And if we want all packages to be revision-enabled, for the sake of sanity, this means that everyone will have to use explicit revision numbers. That's fine with me, but I'm not sure everyone will be happy about it
I'm not sure I understand why it would make installation slow. Earlier you wrote:
Range-version resolution requires more API calls to servers, which are slow.
I'm probably missing something, but the way I see it, it should be possible with a single call, shouldn't it? And this call should be the one that gets a package identified by a package reference.
(1) "Give me the package that matches this reference: Pkg/1.1@user/channel" or
(2) "Give me the package that matches this reference: Pkg/1.1@user/channel-2.3".
In the (1) case, the server can look at the repository and see if there are multiple revisions. Since there was no revision specified in the package reference, it returns the latest revision of the package. In case (2) it can select revision 2.3 and return that.
For version ranges, you need to search the server, for all recipes matching the pattern, then do the actual download. That is much, much slower than just doing the actual, direct download, not only because there are more network calls, but also it requires a search in the server, that even if optimized, is slow. And if it has to be done for dependency graphs with up to hundreds of packages, by default, some people are going to complain because of the performance.
Does conan perform a separate request to find each package? Maybe server could provide a request which allows to find multiple packages at once and also resolve dependencies? However this task may be non-trivial in case when dependencies are located on several remotes. Another possible solution is to retrieve full package index from the remotes (like many Linux distros do) and then decide what to download locally.
In the normal case it doesn't perform a separate request to find each package, right now, it directly fetches the package. When version-ranges are involved, yes, a request is issued per Pkg/*@user/channel
to find all available versions for that pattern in the remote. Such call will be done to each remote in the client configured remotes, in order.
Retrieving the full package index would be even more costly, specially in big servers (like conan.io or bintray), and it will be a caching problem, you know "there are two hard problems in CS: naming things and invalidating caches". This would render the revisions unusable, they are something intended to be fast and agile to update, so the cached full index would become obsolete, and you cannot be retrieving it again and again from all the remotes, that would kill performance for users and also saturate the servers.
We are working on it, currently considering different approaches:
They both have pros & cons, so we are working on it, trying to move forward, but this is a much, much harder problem than it seems at first sight, and there are different trade-offs to take into account. Thanks very much for your feedback!
Forgot to notice. When I wrote about full package index retrieving I meant something like git remote update
, and not apt update
. In other words this could be done incrementally. E.g. server could store a simplified changelog (simplified means without unnecessary log record like "package A was added", "package A was removed"), and client could store a "working state" + its revision. Then to update the index client should only download only a little part of the changelog, apply it to the current state and update the revision. Even for huge servers it's hard to imagine that such request may be very slow.
Also what do you think about a single request to resolve multiple dependencies? In most "good cases" this approach should require 1-2 requests, while in the worst case require almost the same amount of requests as conan does now, but "worst cases" may appear only when a user is doing something really strange and wrong.
The incremental changelog is not simple: the server might need to maintain a incremental changelog for each different client, otherwise, you end up with a huge changelog, and it loses its utility. And having an incremental changelog for each client is almost impossible, taking into account that most of the transactions with clients are totally anonymous (and even that, they do not depend on logged user, but on running client instance, also different per different CONAN_USER_HOME the machine might be running, which could be very large for CI servers)
It is not so simple to retrieve multiple dependencies at once. The dependency graph has to be incrementally evaluated, there are conditional requirements, so the process is to retrieve one dependency, check and evaluate its requirements, get them, and so on. It might improve, for sure, when some of them can be retrieved in parallel, but it depends on the graph depth and breadth. However, even if some network calls can be saved, search for each package still requires a search query in the server, that it is still very slow. Even if we try to use fast approaches, it is still a pattern search, other of magnitude slower than direct fetching. Not possible to implement as the default, that is why considering opt-in or other approaches.
Hmm, I didn't get why the changelog should be stored for each client. This changelog is the same for every client and every client knows its revision, so it knows which transactions are missing locally. About the changelog size, the most important word is simplified (in my company we call it "compressed"). We use this approach for synching data between servers and so far it works :) The main idea of such changelog is "do not store records which do not have an impact on the final result". So finally such log must contain exactly the same number of records as number of binary packages in the repository. After the first updates check all consequent checks should be much faster because the amount of data will be reduced significantly.
Ok, yes, an indexed changelog in the server, where clients can send the latest index and retrieve just the new part is possible. However, still not simple, and costly: it requires a DB on the server, with linear-time retrieval of latest indexes. And some servers like conan.io are already large, serving up to a hundred thousands of packages per month, and growing. And there are many, many, many queries from CI machines (maybe even more than from developers machines), that just fire a new clean environment all the time, so those will require the full index every time too. Not sure either how to handle package removals in the changelog. And, also, doing server side changes is very difficult, take into account that conan now has support in conan_server, conan.io, Artifactory and Bintray. Seems very overkill, adding complexity on the servers should be avoided at all costs, migrations are hard, and bugs take many time to be fixed and distributed. It doesn't scale at all, and we have to consider the community of users as a whole. All the time that we would be developing and maintaining those things, is time we cannot use to develop other more important features or providing support.
Still, to make this issue clear. Having package-revisions is something that can be done today, just by two steps:
We are considering ways to improve this, but they should keep a reasonable cost/value ratio. Lets keep working on it. Thanks very much for your help and feedback :)
Sure. Finally when a command feels that things are getting slow it could decide to use only its private server which would contain only needed packages.
BTW one important concept mentioned in the beginning of the thread wasn't discussed much. Is it planned to implement protection against uploaded package overwriting?
Sure. Finally when a command feels that things are getting slow it could decide to use only its private server which would contain only needed packages.
Yes, but that is precisely the main issue of the #1373 . He is already using a private server, but hosting so many packages that it becomes painfully slow.
Regarding the non overwrite, there are related issues in #679 and #1381.
Ok, I am trying to move forward this issue from the very roots of the problem:
So I am considering the concept illustrated in my branch: https://github.com/conan-io/conan/compare/develop...memsharded:feature/conan_links?expand=1
class TestConan(ConanFile):
name = "Hello"
version = "0.X"
conan_link = "Hello/0.1@lasote/channel"
Maybe don't even need the version and name.
Hello/0.X@....
or using any other way, like Hello/0.1@...
for proxying Hello/0.1.1@...
packagesThis kind of package can be manually edited and uploaded, but further automation could be done:
conan link
command might generate and export such package on the flyconan upload Hello*
will upload the package and the proxy package. revision
field in conanfile could be use to automate the creation of the proxy package while export
-ing such recipe.Please feedback. cc/ @claasd @lasote
package creators might easily revert back to an older revision without removing the newer ones
This should be prohibited on the public server, otherwise users who downloaded higher revision (but don't specify exact revision number) won't be upgraded to good revision. Instead, previsous revision should be published with higher revision number.
This should be prohibited on the public server, otherwise users who downloaded higher revision (but don't specify exact revision number) won't be upgraded to good revision. Instead, previsous revision should be published with higher revision number.
I still don't get the willing to restrict users, specially package creators. Users will be retrieved and updated with the version that the package creator wants them to be updated to. Lets say that you have published Pkg/1.2.3, which is proxied by Pkg/1.2. Then you upload Pkg/1.2.4, and update Pkg/1.2 to point to it, but you definitely screw 1.2.4 with a serious security bug that you don't know how to fix quickly. Would you be forced to create a Pkg/1.2.5, which will be identical to Pkg/1.2.3? Doesn't seem very logical, sounds confusing and a waste of resources. Instead you can just update Pkg/1.2 to point to Pkg/1.2.3 again, which is an almost instantaneous fix.
That's not a restriction, just common sense. Versions and revisions need to be monotonic in time, otherwise they don't make much more sense than e.g. hash sums of binaries.
Lets say that you have published Pkg/1.2.3, which is proxied by Pkg/1.2. Then you upload Pkg/1.2.4, and update Pkg/1.2 to point to it, but you definitely screw 1.2.4 with a serious security bug that you don't know how to fix quickly. Would you be forced to create a Pkg/1.2.5, which will be identical to Pkg/1.2.3?
Wait, we are talking about Pkg/1.2-r3, Pkg/1.2-r4, and Pkg/1.2-r5, aren't we? Yes, this is the only meaningful decision to publish r3 as r5, if r4 is broken
As for software versions, they are completely separate topic from package revisions
And yes, if you released software with version "1.2.4" and it has security bug, you definitely need to release 1.2.5 immediately, even if it's otherwise identical to 1.2.3
@memsharded +1. Technically sounds good, and I think it can solve the problems with the package revisions, slow version ranges resolving and so on. @annulen yes, these are good practices, and we should add it to the docs, but only good practices, we can't and we won't control the versions scheme/flow management of the conan-center packages.
@lasote Versioning of original software is out of control, but it's not the topic here. Revision numbers are related to repsective Conan recipes only, and they can and should be controlled
@annulen Without entering into the debate of how hard we should control the user's packages, we don't have the resources to do it, so it's not an open discussion today.
You say that conan-center is "curated" repository, how can it be called so if it doesn't even have monotonic revision numbers?
It doesn't have revision numbers at all, not even monotonic.
It doesn't have them yet, which is the point of this issue
This is a general discussion about package revisions, not centered in the conan-center repository. You can have your own practices in an on-premises conan server or Artifactory. And of course, it's interesting to find a good solution. You can also have package revisions in conan-center, of course.
Hi,
I like the solution @memsharded. It technically solves our issues. (espacially if we get a conan link
command)
@annulen: We basically already use package revisions, by adding the build number of our CI server to the version number (our packages look like MyPkg/1.0.0-alpha-build.11
. To get the latest build, we use version ranges (MyPkg/[~1.0]@user/testing
). Thus, we have a lot of different version for each package on the server (between 70k and 100k in total). Then, resolving the revision is very slow. This kind of proxy package solves our speed problem. furthermore, I get more control over wich package will be selected.
As for the discussion about prohibiting and good practice, it is my strong believe that you should never enforce those constraints by technical means. Conan itself does not even enforce semantic versioning. We use it, and I encourage everyone to do so, but it may not fit everyone. We also use package references, and I would agree that if package 1.0.0-rc5 broke something, I need to release 1.0.0-rc6. But again, I would never technically enforce it.
Bottom line: I like it! When do I get it? :stuck_out_tongue_winking_eye:
Yeah, I think this is pretty good and would also help with our use cases.
@memsharded In principle I like the proposal but I have one question. You said "Maybe don't even need the version and name." Does it mean that the final consumer conanfile.py would look like this ?
class MyApp(ConanFile):
settings = 'os', 'compiler', 'build_type', 'arch'
generators = 'cmake'
def requirements(self):
self.requires('Lib1Proxy@user/stable')
self.requires('Lib2Proxy@user/stable')
...
If possible I would prefer to have version in the Proxy as an optional field. I think it helps to have a global view of the versions you are using in a project (For example, to know if you are using Qt4 or Qt5). But I do not have a strong preference for that.
In case this approach is approved, what would be the steps to update the conan code of a project to use this approach ? Would we need to do a lot of work ?
@piponazo No, the consumer requirement will look as always, nothing changes there.
I meant the "package-link" (btw, still need a good name to refer to this concept: link, proxy?) could be:
class TestConan(ConanFile):
conan_link = "Hello/0.1@lasote/channel"
because it could be generated by something like:
$ conan package_link Hello/0.X Hello/0.1@user/channel
So the name and version are already explicit in the command, and can be used to put the package-link in the conan cache.
If possible I would prefer to have version in the Proxy as an optional field.
It doesn't matter. In your dependency graph, you will have the real version you are using. So you can make your App require Qt/latest@team/stable
, and if you make a package-link between Qt/latest
=> Qt/5.0
, in your dependency graph you will see "App" => "Qt/5.0"
Conan has merged the conan alias
functionality into develop, will be available for 0.25. This might be a very good core for implementing package revisions.
I have done some performance testing for the conan alias
approach, and seemed very reasonable, like incrementing over 5% wrt directly retrieving packages (running with local Artifactory instance). Mainly because most of the cost comes from initiating conan, still have to check the performance for single-instance resolution.
I am moving the "revisions" feature to 0.26 at the moment, or until we get some feedback about the conan alias
feature.
I've no new feedback about the conan alias
command. So let's wait to 0.27.
I would like to bring to attention a slightly different aspect of package revisions: build reproducibility. Basically, I don't want to use "latest" version anywhere. If I use it, this means, that results that I get depend on point in time when I did the build. But I want to make sure, that if anyone at any point in time check out our source code repository (which contains full list of references for packages it uses) on a given revision and build the solution, they will have exactly the same result, including exactly the same third-parties (even if third-parties used at this point had some bugs).
I believe aliases have nothing to do with this issue. We could add revision number into references, like this: mylib/1.2.3-r.2@user/channel
. And this works fine for most of scenarios. The problem is that this brings semver into the domain of pre-release versions. First of all, this is not what we try to express here. This is the release version, but it also contains package revision, which is a different beast, but we put it here just because we don't have another place to put it.
Besides wrong intent, I can also point out on a particular problem we have with this approach. Imagine I have recipe for lib A
, which wants to use lib B
with version 1.2.x
. But, according to our approach, versions of lib B
are in the form 1.2.3-r1
or 1.2.4-r2
. None of these match 1.2.x
mask. I would like to have a mask like 1.2.x-r
, so I can have a latest revision (treated as pre-release version by semver) for any patch version within a given major and minor version. But this mask is not a valid semver range.
So, the questions:
@memsharded any ideas on the questions?..
Besides wrong intent, I can also point out on a particular problem we have with this approach. Imagine I have recipe for lib A, which wants to use lib B with version 1.2.x. But, according to our approach, versions of lib B are in the form 1.2.3-r1 or 1.2.4-r2. None of these match 1.2.x mask.
We have implemented revisions very similarly, only without the r
before the revision number. We have a script that updates revisions in dependent packages recursively (because updating a revision changes its contents, so we up the revision all the way to leaf packages. This works very well in practice and avoids the need for version ranges. As a result, we completely avoid the issue you raise here: we always use exact versions-revisions everywhere. We'll probably never use ranges or aliases precisely because we want build reproducibility. I guess we could get away with collecting and storing the resolved version numbers at the time of build, but that detaches the information from version control.
Most package managers have a concept of package revision, i.e. additional version number that reflects changes in packaging scripts or applied patches when "main" version number of packaged software remains the same.
It would be great if Conan added support for revisions too. This will make package updates more trasparent ("updated from vX.Y.Z-r1 to vX.Y.Z-r2"). Also there could be a policy that "stable" channel can never change conanfile and binaries without bumping revision, to prevent accidental changes in packages used in CI with manifest verification.
It would be great if it was possible to keep binary packages for previous revisions so that CI system with manifests checking does not get broken in case new revision is uploaded without committing new reference manifests.
It was previously briefly discussed at https://github.com/conan-io/conan/issues/480#issuecomment-247545547