NuGet / Home

Repo for NuGet Client issues
Other
1.5k stars 252 forks source link

packagereference should support a locked graph at restore time #4394

Closed tkrick closed 6 years ago

tkrick commented 7 years ago

The default flow for creating a VS starting in 2017, for .net core console project will use the package reference flow. Similarly, VS will also ask for traditional apps to use packagereference. So this packagereference will quickly become a new paradigm for how to consume packages. Yet in this flow, there seems to be no way to lock a graph so that the restore is reliable. For enterprise, being able to rely on a rebuilds of a commit (or minor edits to repo then rebuild a hotfix) to consume the same package graph is important.

From my understanding, the package reference build flow (nuget.exe restore) only consumes:

This implies that every build is doing a graph resolution and could choose different versions depending on local state and packages present in the feed(s) at a given time. There appears to be no way to lock a graph (all versions of all directs and indirects) so that the rebuilds of the same commit are reliable. This is true even if you clear your package cache between builds.

example: even if you kill package cache between builds: indirect rules coming from candidate packages say [1.0,infinity) yet only 1.1 exists, then later someone pushes 1.0, 1.0 will be the new package downloaded whereas the previous build of that commit used 1.1.

example 2: don't kill package cache between builds (and enterprise scale no-no). build commit c2 which brings in package P at 1.1 (say a dependent package Q at c2 has rule for P to be [1.1,infinity)). Then build commit c1 with the same package cache from same machine as earlier build, which brings in a different version of Q now such that it has a rule for P to be [1.0,infinity). This uses P at 1.1 since it is there in the package cache and satisfies the rule. Now, later, rebuild commit c1 on a different machine with empty package cache and it will use P at 1.0. So c1 can be built with 1.1 or 1.0, which may have catastrophic effects for the rebuild but not the original build.

I recommend:

Details about Problem

NuGet product used (NuGet.exe | VS UI | Package Manager Console | dotnet.exe): VS UI

NuGet version (x.x.x.xxx): 4.0.0.2188

VS version (if appropriate): 2017 RC

OS version (i.e. win10 v1607 (14393.321)): win10

rrelyea commented 7 years ago

We have duplicate issues tracking a request for a lock file. We are very interested in this.

tkrick commented 7 years ago

ok. I would consider this a regression not a feature. Packages.config was a locked graph. Now, the first time where basically Visual Studio is not using packages.config as the default nuget flow, it will not be locked, it will be resolve on the fly paradigm, which is a very different paradigm. I would suggest don't ship 2017 without going back to the locked by default paradigm, or fix it in as quickly an update as possible.

sandyarmstrong commented 7 years ago

This is a major regression. Reproducible builds and deterministic package management are vital features.

This issue is already very well described but here is another good article explaining the issue in a general sense: https://blog.ometer.com/2017/01/10/dear-package-managers-dependency-resolution-results-should-be-in-version-control/ .

davidfowl commented 7 years ago

@tkrick there are a number of reasons that even with PackageReference the builds are deterministic by default, namely NuGet always picks the lowest version (and I believe dependency behavior is ignored by default). It means that unless you change the input feeds, you'll always get the same result after you have fully specified versions (not floating).

davidfowl commented 7 years ago

example: even if you kill package cache between builds: indirect rules coming from candidate packages say [1.0,infinity) yet only 1.1 exists, then later someone pushes 1.0, 1.0 will be the new package downloaded whereas the previous build of that commit used 1.1.

Could you clarify this? When you say cache did you mean the feed? Are you saying the input restore feeds are volatile and change from build to build? Are you also deleting older versions?

example 2: don't kill package cache between builds (and enterprise scale no-no). build commit c2 which brings in package P at 1.1 (say a dependent package Q at c2 has rule for P to be [1.1,infinity)). Then build commit c1 with the same package cache from same machine as earlier build, which brings in a different version of Q now such that it has a rule for P to be [1.0,infinity). This uses P at 1.1 since it is there in the package cache and satisfies the rule. Now, later, rebuild commit c1 on a different machine with empty package cache and it will use P at 1.0. So c1 can be built with 1.1 or 1.0, which may have catastrophic effects for the rebuild but not the original build.

Also not sure here what you mean by cache? Did you mean the input feed (which is usually checked into source control). Why would it use P 1.1 if nuget always picks the lowest?

I don't quite understand your setup, can you elaborate?

tkrick commented 7 years ago

The "cache", i mean, is the global packages directory. It is not really a cache. But there is also the involvement of the nupkgs and version list cache in the v3-cache. I confess I do not fully know the internal workings of when these caches are involved.

for clarifying example 1: imagine you set a rule for [1.0,infinity) even if the version 1.0 did not exist. then someone pushes 1.0. before pushing you'd get 1.1 then silently it would change to 1.0 (for different machines or if you cleared locals). Albeit this is somewhat contrived (usually you push in version order and do not express rules that reference versions that do not exist yet), it still means a resolution on the fly paradigm does not lead to idempotent builds.

for example 2: To answer your question: "Why would it use P 1.1 if nuget always picks the lowest?" Hmm, I haven't verified but i thought there was some timeframe under which graph resolution prefers local packages if they satisfy the graph before reaching out for the list of versions. I confess I do not know the internals of how often and when a new list of versions is consulted, especially when offline scenarios are considered. I know in VSTS that delisted packages do not show up in flat index.json, so versions could appear or disappear from that list and cause issues like this where one build that has the version list cached has 1.0 and another build consults latest version list and only sees 1.1.

Another example: Some nuget protocol server implementations support full delete. If previously you were consuming [1.0,infinity) yet 1.0 got deleted, then you suddenly uncached machines would start consuming 1.1 or 2.0. The preference would be to break explicitly in that case yet this would silently auto upgrade. Silently changing behavior is dangerous. You probably either want to upgrade explicitly, or maybe what you want is a hotfix on the 1.0.* that you may need to be made available/pushed.

Yes, I agree that the default resolution choosing the lowest version that matches the computed rules is "safer", it is still not a reliable, idempotent build. Or another way of putting it, "more reliable" is not reliable. Any time fetching a list of versions is part of the normal build process, you do not have a reliable build.

davidfowl commented 7 years ago

I agree there are situations that can cause builds to "do bad things" and some situations would be helped if there was a fixed set of versions. I just want to reduce he FUD on this thread because the premise isn't correct. If feeds are volatile the the result of the restore will be as well.

If you lock the versions and still delete packages from the feed I don't know how your restore situations will function.

This is why I'm asking for specific examples because there are subtle interactions between the global packages folder and the feed to make sure things don't float unnecessarily. So you might be thinking that things are more volatile than they are.

Overall, the cases in my mind for a lock file are for floating versions, but not much else. I'd like to zoom in on any other scenario you have in mind so we can explore what happens today and what will happen after the lock file is introduced.

timabell commented 7 years ago

Thanks for the public discussion, great to see Microsoft planning such things in the open these days :-)

Even if the resolution of dependencies is [relatively] stable I like the idea (as a dev) that I can see at a glance in source control every version number in the tree of dependencies from the last time dependency resolution was run. I can also then bisect the history to find out which dependency version change introduced a bug into my program. Having used Ruby on Rails (with Gemfile.lock) and dotnet core's json based system with project.lock.json I feel like the loss of the lock file with the switch to csproj in dotnet core is a loss.

Having said that, on (re-)reading those articles I now realise that actually although they were named similarly the advice is different for Gemfile.lock (check it in) vs project.lock.json (don't check it in), and maybe things aren't how I originally understood them.

tkrick commented 7 years ago

@timabell the project.lock.json is not really a lock file since it isn't designed to be checked in. I consider the PackageReference as-is and project.json worlds to be somewhat equivalent.

@davidfowl "premise isn't correct": example 1 which i verified just now is possible. It shows that even without volatility (deleting/overwriting, i argue pushing is not volatility or if it is it shouldn't cause version switching) it is possible to switch versions between different builds of the same commit, albeit in a somewhat contrived scenario. Unfortunately any FUD introduced by example 2 (which i assume cannot happen?) is because it is difficult to reason about because algorithms can change from release to release and there is no documentation to show the resolution flow. Sorry. My main concern is that by introducing any amount of resolution of version sets leads to the potential for unpredictable behavior.

The delete example (deleting the minimum version expressed in packagereference rules) is what you would say is volatility in the feed. Agree. I verified this scenario as well, it autoupgrades you to the next latest version, the new minimum. Yet instead of silently autoupgrading which can lead to unverified behavior changes, I would argue a safer method should be that you can either fulfill the build as it was resolved on the original developer's box who did the resolve, or not (because the version no longer exists) to send a clear signal that it was not possible to restore in the original form. PS unrelated: I also found a bug in this scenario. nupkgs (not just extra calls to flat index.json) will be refetched every 30 minutes when the v3-cache list_[pkgname].dat gets out of date if the minimum version expressed in rules is not present in the versions list even if those nupkgs are present in global package cache and v3-cache.

Delete example and example 1 (rule minimum is not same as current minimum on server, then push rule minimum) are the only cases of silent version switching i can find from a black box perspective.

Leveling up, the lock file seems appropriate to ensure idempotency, regardless of the resolution strategy (minimal/latest) though more valuable to the latest strategy. I think the original request still applies. Lock files are goodness, for idempotency, and ideally, for ensuring hash contents as well to prevent the content disparity across multiple feeds problem, and maybe remembering original source to reduce authentication/404 load. Feel free to reach out to me directly if you want to dig any deeper.

davidfowl commented 7 years ago

. I verified this scenario as well, it autoupgrades you to the next latest version, the new minimum. Yet instead of silently autoupgrading which can lead to unverified behavior changes, I would argue a safer method should be that you can either fulfill the build as it was resolved on the original developer's box who did the resolve, or not (because the version no longer exists) to send a clear signal that it was not possible to restore in the original form.

If your package targets a version of a package that does not exist (because you deleted it) then it has to roll forward to the next one, it's the only reasonable behavior. The reason that happens is because nuget versions are minimums not exact versions. The tooling also warns you when this happens because it's likely a mistake somewhere (it says something like you asked for A 1.0.0 but got A 1.1.0). It will also be possible to make those warnings errors. I don't see how a lock file helps there, because if I delete my global packages folder (which my ci machine does every build), then restore, I'm in the same boat. The caching "issue" is by design to ease the pain in dealing with floating versions by saving http calls. Not sure it has an real effect on bumping minimums unless we're using your "contrived" scenario.

I don't disagree that lock files are useful and allow the input to the restore operation to be captured and persisted in source control (to whatever degree makes sense), I just rather us all agree that in practice, within default, nuget usage there aren't any real determinism problems. I'd also like us to acknowledge that adding a lock file only helps with floating versions. It doesn't help if feed contents are removed from build to build.

If we can agree on that premise, then we can move forward and discuss how lock files can makes sense and how it changes the workflow in the nuget tooling.

PS: I'm one of the original nuget developers and I also had a heavy part in the latest incantation of the resolution algorithm and experience.

tkrick commented 7 years ago

it has to roll forward to the next one, it's the only reasonable behavior.

No, that is not reasonable behavior IMO. I don't think we can agree on that. That is silent autoupgrade. Better to get clear response saying that the version you were consuming got retired (by the package owner or by retention policies), so that you can either 1. reach out to owner to ask them to undo the delete if your consumption scenario is important enough and you cannot immediately upgrade cheaply; or 2. making and verifying upgrade changes to ensure that quality is maintained.

So no I don't agree with the following, because deleting feed contents is and should be a very valid scenario, especially in high CI/CD flows + retention:

in practice, within default, nuget usage there aren't any real determinism problems

I'm fine with moving past that though and figuring out a viable strategy for locking. Such a solution I would guess should ensure:

  1. Determinism by hashed contents: ie no chance of switching between different hashed contents. I see nontrivial percentages which I can't share here of content disparity with the same versions occurring across feeds so this is a real concern.
  2. If the build cannot be fulfilled because some portion of the closure of package version contents are now deleted then the unfulfilled build is failed clearly and immediately.
  3. The lock file should create minimal source control conflicts (line by line resolution) if there are changes to different packages that are mutually exclusive of one another.
  4. (ideally) Single feed per version: The build flow does not need to reach out to multiple feeds per package version, only reaching out to other feeds for the same content hash only as a fallback in the case where the feed that originally served out the content no longer has it.
davidfowl commented 7 years ago

No, that is not reasonable behavior IMO. I don't think we can agree on that. That is silent autoupgrade. Better to get clear response saying that the version you were consuming got retired (by the package owner or by retention policies), so that you can either 1. reach out to owner to ask them to undo the delete if your consumption scenario is important enough and you cannot immediately upgrade cheaply; or 2. making and verifying upgrade changes to ensure that quality is maintained.

So no I don't agree with the following, because deleting feed contents is and should be a very valid scenario, especially in high CI/CD flows + retention:

Sorry but I have to pick on this. If you are deleting packages constantly in your CI/CD workflows how would older commits ever be buildable? Like I said, this could be made an error by default in nuget but it happens so rarely because people just don't delete older packages they care about that often, at least, not in my experience with nuget. If failure is your goal then specify an exact version in your project instead of a minimum via the [version] syntax. Then it all of your older commits will gladly start to fail when your dependencies are randomly deleted from the feeds you are referencing.

Now, if you were being serious about your dependencies and wanted to keep everything truly buildable, you would mirror packages you care about to a feed under your control. That way nobody is deleting relevant packages and your builds for older commits aren't suddenly failing. Without this, you don't really have any reproducible builds. The lock file can hash and version anything it wants, if you can't get the packages it won't matter. This is also why nuget.org still doesn't support package deletions (for the most part).

I see nontrivial percentages which I can't share here of content disparity with the same versions occurring across feeds so this is a real concern.

What does this mean? Are you in a situation where you have a package with the same version and different content on different feeds? If the hash doesn't match, would the build fail or would the lock file be updated?

If the build cannot be fulfilled because some portion of the closure of package version contents are now deleted then the unfulfilled build is failed clearly and immediately.

Sounds fine to me. Like I said this is already doable by locking the version. FWIW we originally discussed at length whether this should fail by default or not and decided to keep the behavior for projects consistent with packages. That is, if you have this situation:

Feed contents: A 1.0, B 2.0

Packages: A 1.0 -> B 2.0

Project: Application -> A 1.0

Since B 1.0 is a minimum and it doesn't exist on the feed, the resolved set of dependencies will be A 1.0 and B 2.0. Of course, if A 1.0 had a strict dependency on B [1.0] it would have instead failed. When we looked at moving this behavior to the new transitive behavior we were faced with what to do in this case:

Feed contents: A 2.0, B 2.0

Packages: A 1.0 -> B 2.0 A 2.0 -> B 2.0

Project: Application -> A 1.0

We decided to mirror the existing package behavior, that is, treat A 1.0 as a minimum instead of an exact version (which is why it can float) for consistency. This causes unexpected results which is why we warn you when it happens.

One other thing we did try was to only accept a higher version if after dependency resolution, that version was bumped, otherwise fail but that ended up breaking a bunch of poorly written nuget packages. In hindsight, maybe it would have been better to just be an error which would have forced people to fix their nuget packages in the first place... I digress...

The lock file should create minimal source control conflicts (line by line resolution) if there are changes to different packages that are mutually exclusive of one another.

Sure.

(ideally) Single feed per version: The build flow does not need to reach out to multiple feeds per package version, only reaching out to other feeds for the same content hash only as a fallback in the case where the feed that originally served out the content no longer has it.

If we stored where the original package came from this would be doable which sounds fine.

tkrick commented 7 years ago

sorry for long latency, quite busy this last week.

If you are deleting packages constantly in your CI/CD workflows how would older commits ever be buildable?

Sure, ideally packages still being consumed won't be deleted. But different parties and reasons for deletion mean that deletion of even consumed versions will happen (more on this below). In that case, they won't be buildable. It gives a clear signal that the build is no longer able to be fulfilled (ideally with a reason why like deleted due to retention). It also means that what is in source control represents what is actually used and never anything else, for audit-ability.

Like I said, this could be made an error by default in nuget

Making this an error means that there is no need to reach out to the flat index.json api for the list of versions, we can rely on the nupkg call to tell us we have it (2XX or 303) or we don't (404). We then get back to why fetch the list of versions at all during a build.

but it happens so rarely because people just don't delete older packages they care about that often, at least, not in my experience with nuget.

That is because nuget.org doesn't allow you to delete your packages without an explicit support request. Other provides where you have to pay for storage (artifactory/myget) and have retention rules, packages are often cleaned up. Hopefully not so much for the ones someone is actively consuming but the producer is in charge of their costs and their support policies not the consumer (see mirrored feeds response below). As CI/CD scenarios are enabled, package retention/deletion will be more frequent.

If failure is your goal then specify an exact version in your project instead of a minimum via the [version] syntax.

Multiple issues with that strategy. 1. locked consumption rules [version] are not the default in visual studio 2. most people do not realize that the version by itself is a floating rule meaning [version, infinity) not a locked version. 3. rules tell you what the consuming code could accept not what they are currently consuming, you are precluding those rules from flowing in to subsequent upgrades and future nuget publishes of your code. Imagine you wrote automation to auto-publish your csproj code as a nuget package. The rules chosen about what versions of a dependency that a dependent works with should not be locked but should be a range of versions of what it accepts. But you are saying to make that rule locked, which is not viable for package to package dependency resolution especially as you introduce diamond dependencies.

This is why we should separate the rules about what versions you could consume ([2,3), [2, infinity), etc) from the currently consumed graph that you have locked ([2.1.0], [2.3.0], etc]).

The lock file can hash and version anything it wants, if you can't get the packages it won't matter

This is about auditing and reproduc-ability. Build-ability is not guaranteed. Idempotency if you still have a successful graph is what should be guaranteed. Explicit failure is more desirable than version switching.

Now, if you were being serious about your dependencies and wanted to keep everything truly buildable, you would mirror packages you care about to a feed under your control.

What will make that difficult is that mirroring has a setup and a maintenance cost. The person responsible for deletion even from a mirrored feed is not necessarily the same person who was broken by deletion. Deletion policies (even in a mirrored feed) could be driven because of financial, security or other concerns. Creating a mirror per developer with infinite retention policy is of course possible, but in practice that would be too costly. The point is, deletion can and will occur underneath you at least at some point.

Are you in a situation where you have a package with the same version and different content on different feeds?

Yes, same version different content. I calculated it based on the percent of package names within Microsoft's VSTS organization that have at least one version that has different content under the same version in different active feeds in the same organization. That percentage is nontrivial.

If the hash doesn't match, would the build fail or would the lock file be updated?

If hash doesn't match, build would fail. The build can and should not update source control, where the lock file (i like to call it the consumed versions list) lives. The developer could easily fix the build by choosing to change the resolved graph to a different one, assuming 1. no one butchered the consumption rules by locking versions and 2. there was another available version that resolves and works.

jainaashish commented 6 years ago

This is implemented as part of https://github.com/NuGet/Home/issues/7139

There is more work to have the same lock file at solution level as well which is being tracked as part of https://github.com/NuGet/Home/wiki/Centrally-managing-NuGet-package-versions