Solution-Level or Repo-Level Lock Files

yaakov-h commented 1 year ago

NuGet Product(s) Involved

Other/NA

The Elevator Pitch

Managing multiple project lock files for a huge solution or a repo can be daunting. This was even called out in 2018 when lock files were introduced:

https://devblogs.microsoft.com/nuget/enable-repeatable-package-restores-using-a-lock-file/#solution-or-repo-lock-file

Has there been any further thought or development on solution-level or repository-level lock files?

The link in those paragraphs doesn't contain any further information on centralised lock files, only on centralised versioning, and I can't find any related issues here on GitHub either.

Additional Context and Details

This is particularly important to me as I don't want to have to go and update approx. 1000 csproj files when updating a core package such as a Roslyn Analyzers package, or Newtonsoft.Json, or similar.

This also gets much more difficult when multiple developers are trying to make changes (e.g. adding a ProjectReference which adds new transitive NuGet dependencies, which I believe requires updating the lock file?), and looks like it will quite quickly lead to merge conflict hell.

In many of our projects already use Paket, which provides a single top-level lock file for all dependencies defined in a paket.dependencies file (analogous to nuget.config + Directory.Packages.props). This makes batch operations such as upgrading or downgrading a package across an entire repo a fairly easy and straightforward process. Changes to individual projects require no changes outside of their own project files (.csproj + paket.references) and as such enable many type of concurrent changes to be made without file merge conflicts.

Please 👍 or 👎 this comment to help us with the direction of this feature & leave as much feedback/questions/concerns as you'd like on this issue itself and we will get back to you shortly.

Thank You 🎉

heng-liu commented 1 year ago

Hi @yaakov-h , we have introduced CPM(Central Package Management). Is this the feature you're looking for?
Please refer to the following for more details: https://devblogs.microsoft.com/nuget/introducing-central-package-management/ https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management

yaakov-h commented 1 year ago

@heng-liu Central Package Management seems to be completely orthogonal to lock files. I can use CPM with lock files, I can use CPM without lock files, I can use lock files without CPM, and I can use neither.

If I use CPM with lock files, or I use lock files without CPM, the result is the same: every single individual .csproj file gets an attached packages.lock.json file, rather than a centralised lock file as seen in most other package managers (Paket, NPM, Yarn, Cargo, et al.)

The blog post I linked above states:

We have started to think about providing better and easier way to manage package dependencies centrally i.e. at a solution or even a repo level – and thereby providing a central lock file.

... but I have yet to find any documentations, examples, or even a GitHub trail that a "central lock file" feature has ever been in development.

heng-liu commented 1 year ago

~~Hi, @yaakov-h ,CPM(Central Package Management) is the solution-level package management feature.~~ ~~Please refer to the below docs if you'd like to have a try :)~~ ~~https://devblogs.microsoft.com/nuget/introducing-central-package-management/~~ ~~https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management~~

yaakov-h commented 1 year ago

@heng-liu I am fully aware of that, but the lock files feature does not seem to have any central management capabilities.

See https://github.com/yaakov-h/demo-nuget-cpm-lock for a demonstration of what I mean - despite having central package management enabled, each of the 9 projects get an individual lock file.

In my larger corporate projects we have hundreds of projects in a single repository, so the current implementation of lock files will not scale.

Is there any consideration towards "a central lock file" as suggested by Microsoft in 2018, and as every other package manager that I have used already features?

heng-liu commented 1 year ago

~~Sorry for the confusion caused, but you need to opt out from using lock files feature first.~~ ~~So in CPM, package versions will be specified centrally in Directory.Packages.props.~~ ~~Could you try that and let us know if you have any questions when using CPM? Thanks!~~

~~Hi @jeffkl @JonDouglas , shall we add this note(opt out from lock files feature first) in our doc at https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management#get-started?~~

yaakov-h commented 1 year ago

@heng-liu i don’t think you’re following the question. What’s the story with centralised lock files, given that the last notes I can find on this are four years old?

heng-liu commented 1 year ago

~~Hi @yaakov-h , if you check blog Enable repeatable package restores using a lock file, the Centrally managing NuGet package versions is CPM. So CPM is the story with central lock file.~~ ~~But you have to opt out from previous lock file and then opt in the CPM.~~ ~~Sorry that we didn't add that info into our document at https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management#get-started?~~

yaakov-h commented 1 year ago

@heng-liu I don't follow.

CPM cannot be the story with central lock file because it does not provide a central lock file. On its own, it provides no lock file at all. Currently they are two completely unrelated features, though they do work together (badly).

The blog post talks about providing a central lock file, which is not something that CPM offers.

The old Wiki page doesn't discuss it either.

So, what happened to the idea of central lock files?

heng-liu commented 1 year ago

Hi @yaakov-h , I checked with the team and you're correct: the two are different features and we can have both Lock file and CPM enabled at the same time. Sorry for the incorrect info I provided! I'll cross out my previous comments so that it won't mislead others. The "central lock file" is not supported yet.

Hi @JonDouglas , if there is any conclusion about "central lock file" feature after discussion with the team, can you please update it? Thanks!

ryanerdmann commented 1 year ago

Adding my +1 here; having a centralized packages.lock.json would be a huge improvement when using CPM. Looking forward to seeing conclusions here!

TheAngryByrd commented 1 year ago

+1 here too. This is the main reason I still use Paket. Having a solution/repo level lock file is critical on larger projects. And being able to see a diff of the lock file makes it digestible to accept dependency changes. Having them scattered across the repo makes this visualization much more difficult.

jebriede commented 1 year ago

[Team Triage] CPM provides a lot of the same functionality as a central lock file. In addition, the community has told us that diffing lock files manually is hard. Would you mind letting us know what exactly you would find useful in a lock file while using CPM?

baronfel commented 1 year ago

Diffing lock files is hard and pointless when you have to do the same comparisons once for every project in a repo. It's much more tenable a task when you do it once, for everything. Plus, it becomes more worthwhile to create tooling to make the comparison task easier, for example this web-based paket lock diff viewer showing a diff from an old PR I did to the current main branch of the same repo: Paket Lock Diff Viewer for fsharp/fsautocomplete#1043

yaakov-h commented 1 year ago

@jebriede I expect that a lock file would give us an easy list of all packages used.

This is useful for things like dependency analysis, looking for outdated / vulnerable transitive dependencies, and audit/compliance for licencing purposes on the full tree of dependencies.

It can also be better analyzed by CI/CD tooling e.g. to prohibit someone bringing in a "banned" package, or to flag a completely new package that hasn't been used by the company/project before.

Diffing a lock file isn't too bad when you only have one, as @baronfel mentioned above, and quite a few times has been useful in my code reviews where it exposed some interesting changes to the dependency graph that would otherwise not have been visible to the reviewer.

Potentially it could also be used to improve restore performance as NuGet would not need to recalculate the graph each time, since which transitive dependencies you resolve is already known ahead of time.

TheAngryByrd commented 1 year ago

CPM provides a lot of the same functionality as a central lock file

I don't quite understand this. Could you elaborate as to why you think these are comparable features?

In addition, the community has told us that diffing lock files manually is hard.

Yeah, diffing lock files are hard, but it's not harder than what's available with nuget's lock files per project. There are tooling solutions available to make it easier to see the diff at a glance.

Would you mind letting us know what exactly you would find useful in a lock file while using CPM?

As @yaakov-h pointed out in his issue, it's about having many projects and lock files paired with them. Updating a dependency could update many project's lock files. Having a central one would mitigate having potentially dozens of lock files updated. Additionally, per the previous point, it becomes much easier for tooling to analyze the diff it's one file vs dozens or more.

savornicesei commented 1 year ago

+1 to this. Having one package.lock.json file per project adds lots of unneeded noise while its purpose is to provide repeatable restores across all projects. I truly believe "One sln, one lock file" would bring more value to devs and devops than what we have right now.

Just a little context to this: The journey that brought me to this issue was the need to cache NuGet packages in our azure pipelines. To ensure same versions of deps. across our .sln, central package management was the next thing to add. And the last one was the lock file used in the cache key name which turned out to be many. So I can change in a single place the version of one dependency but I'll have to commit multiple lock files where this dependency is used, with high possibility of merge conflicts for the rest of the team (because of its json format).

JonDouglas commented 1 year ago

Hi all,

Thank you for all the comments and upvotes on this issue so far. Let's keep the conversation going regarding use-cases/scenarios, other prior art/ecosystems who are doing this well, and continue voting on this issue so we can weigh this feature over our planning cycles.

As a friendly reminder, please use the 👍 or 👎 reactions to the parent comment so we can best weigh the direction(I've updated the parent comment to include that reminder to help collect votes).

Aside, we do have a community-oriented proposal process that anyone can help champion an initial functional/technical proposal to help provide more clarity to this desired feature.

savornicesei commented 1 year ago

@JonDouglas This has been proposed and has a status of implemented https://github.com/NuGet/Home/wiki/Enable-repeatable-package-restore-using-lock-file So either:

it's not working
it was partially implemented

There will be 2 scopes for the creation/working of the lock file: At project level - In this case the lock file is created per project. At a central level when the packages are managed at a solution or a repo level - In this case the lock file is also created centrally in the same folder as the packages.props file.

JonDouglas commented 1 year ago

@savornicesei I don't know the history, sorry. That predates me. I do not believe this proposal(~2018) was fully implemented as NuGet CPM(~2022) came after although CPM has been around for awhile in some shape or form. Thus why there is this issue. Someone can likely correct me, but that's my understanding.

Thanks for digging through old designs to find some existing thoughts around this. It would still be beneficial to revisit this given its been over 5 years since someone has.

Let's use this issue to continue tracking the desire of a central/solution/repo lock file.

et1975 commented 1 year ago

Also want to point out that solution-wide sometimes doesn't mean the same version for all the projects. With paket my often-used feature is the groups - ability to specify one set of targets/dependencies/versions for one group of projects and another set - for another set. Comes in handy for separating what you ship from what you need to build/test/ship.

savornicesei commented 1 year ago

separating what you ship from what you need to build/test/ship

@et1975 Your scenario is interesting but I cannot quite grasp it. Could you expand on it a little bit more? I would like to know more but without taking this discussion too far astray from its scope. Thank you.

purkhusid commented 1 year ago

This features is really the main feature that NuGet is missing to be comparable with other package managers. If you look at most of the popular language ecosystems you have have the capability to have a single place where you manage your dependencies and their versions and their transitive package versions are frozen in place unless the end-user explicitly upgrades them.

There is of course already a package manager available in the .Net space that take care of this and more: http://fsprojects.github.io/Paket/ I think the NuGet team could really just look at Paket and borrow a lot of ideas from there since it does it's job phenomenally. They main downside of paket is that the lockfile format is not json/yml. Having the lockfile in json/yaml would make it easy to parse the lockfile with various dependency analysis tools.

My use case for this feature is to use the lockfile in https://github.com/bazelbuild/rules_dotnet which is a Bazel replacement of MSBuild. Bazel usually shines in large monorepositories and most of the time a single-version policy for dependencies is used in those monorepos. Having a single lock file would allow me to parse the lockfile and generate Bazel targets for each dependency. I currently have support for Paket in there and it works great but some people find it hard to have to first migrate to Paket and then Bazel when adopting rules_dotnet.

tebeco commented 1 year ago

separating what you ship from what you need to build/test/ship

@et1975 Your scenario is interesting but I cannot quite grasp it. Could you expand on it a little bit more? I would like to know more but without taking this discussion too far astray from its scope. Thank you.

The idea of group is to have 2 resolution tree completely agnostic to each other. The idea of applying that to test is probably missleading. Rephrasing that would be something like ... "If you use the same paket.dependencies for both "your code, and also your build system (such as Fake) you want to make sure that Fake and related transitive package have no impact on your production code" while at the same time the original post is missleasing using the term build and test. Most of the time you can to have your test project being inside the same dependency tree of the src because that's how test runs ... they ProjectReference your code. If you start splitting explicitly test and src you will at some point endup in a situation where the transitive of the test group will not resolve the same at the transitive outside of the test group (eg: Msft.Ext.* and major release) This will result in a nightmare in your codebase and you brain sanity while trying to understand what's going on

TLDR on group: it's just like if you had multiple sln and you resolve sln independently (becauwe that's how nuget works), and each would have their lock file

filmico commented 1 year ago

I might found a solution for the Cache of the Nugget Packages at DevOps using CPM

I come here on same situation stated by @savornicesei at a previous post where she relates to Cache the Nuget packages on the Build Pipeline at Azure DevOps.

The two main links I was using are:

Central Package Management (CPM) Link Cache NuGet packages with lock files Link

For Example, Scenario of 1 solution at a root level with several projects under src linked to that solution like this structure

src

Api (WebApi project)

xx.csproj

Domain (Class project)

xx.csproj

AnyOther Project

xx.csproj xx.sln

Was the Chicken or the Egg?

If we implement the Lock mechanism, we end up with several lock files, one per project. This tend to be difficult to track in relation to changes as we have to look on several and long lock files. Not to mention the posibilities to have collisions with another commit of another developer adding-removing packages in one of the layers for a lock file that was designed not to be touched by us.

On the other hand, the CPM looks like a more elegant solution at least for tracking changes on packages and also to fix merge conflicts as we can edit the Directory.Packages.props with no problem at all.

So the Trick I found viable for the Scope of a Build and cache of Nuget Packages was the use of CPM Plus the following yaml script

AZ Build Pipeline (Just the important bits)

variables:
  NUGET_PACKAGES: $(Pipeline.Workspace)/.nuget/packages

  - task: Cache@2
    displayName: 'Nuget Cache'
    inputs:

      # If you follow the lock mechanism you can do this
      # key: 'nuget | "$(Agent.OS)" | **/packages.lock.json,!**/bin/**,!**/obj/**'

      # If you use the CPM mechanism you can do this
      key: 'nuget | "$(Agent.OS)" | **/Directory.Packages.props'

      restoreKeys: |
        nuget | "$(Agent.OS)"
        nuget
      path: '$(NUGET_PACKAGES)'
      cacheHitVar: 'CACHE_RESTORED'

  - task: NuGetCommand@2
    displayName: 'Nuget Restore'
    condition: and(succeeded(), ne(variables.CACHE_RESTORED, true))
    inputs:
      restoreSolution: '$(solution)'

The Cache@2 will hash your Directory.Packages.props and if it detects a change with a previous build it will change the CACHE_RESTORED variable Later the NuGetCommand@2 task will evaluate a condition and restore packages only if the hash of the file Directory.Packages.props has changed.

This does not propose anything in relation to the lock mechanism actually implemented for Nuget... I'm on the same side of anybody telling... Please review how NPM, YARN and the other librearíes are locking packages as they manage to do it in a central file with great success.

Cheers

gtbuchanan commented 1 year ago

It is also my hope that a solution-level lock file removes the requirement of running a NuGet restore to generate the project.assets.json files after using the Cache task in Azure Pipelines:

https://learn.microsoft.com/en-us/azure/devops/pipelines/artifacts/caching-nuget?view=azure-devops#restore-cache

If you encounter the error message "project.assets.json not found" during your build task, you can resolve it by removing the condition condition: ne(variables.CACHE_RESTORED, true) from your restore task. By doing so, the restore command will be executed, generating your project.assets.json file. The restore task will not download packages that are already present in your corresponding folder.

Even after the Cache task has run, the restore adds at least 30 seconds to our build time for what I would consider no reason. I expect the lock file I already generated to be used exclusively (similar to npm). However, this may warrant a separate issue.

zivkan commented 1 year ago

It is also my hope that a solution-level lock file removes the requirement of running a NuGet restore to generate the project.assets.json files after using the Cache task in Azure Pipelines:

It won't. The assets file tells the rest of the build which assets to use. For example, when a package supports multiple tfms, NuGet has to select which which package TFM to use, so that the compiler eventually references the correct dll, and the correct dlls get copied to the bin directory. If the package contains msbuild targets/props, they need to get imported. If the package contains native files, content files, etc, they need to get copied.

The lock file doesn't contain any of that asset information.

CEbbinghaus commented 11 months ago

This is the 4th highest Issue by Upvotes. It would be nice to gain some additional traction on this.

A possible solution would be adding a property to the csproj to allow setting the file location. This could then be set in the directory.build.props which would in turn apply to all projects and set them all to resolve form the same lockfile. Although I have no idea how roslyn resolves its packages based on lockfiles since any given project only needs a subset of the total number of packages.

Hoping to see some progress. Always happy to contribute if there is a plan forward with a agreed upon design :3

CEbbinghaus commented 11 months ago

Our company is quite dependent on this change to get rid of Paket, As such I have been asked to spike a solution to this problem. I suspect that adding a property to the CSProj file will require changes in the SDK and as such take longer to get accepted, but I believe it is probably the most flexible of the options. Since we have many hundreds of solutions, a solution level lock file would not meet our requirement of having only 1 lock file. It would be nice to get someone else's input especially from core dotnet/nuget contributors before I start.

savornicesei commented 11 months ago

From my limited testing, CPM seems half-thought and half-implemented resulting in bigger build time for smaller projects. Besides having one lock file per project, right now you can't do a dotnet build --no-restore as it's missing the deps.json files from each project /obj folder.

zivkan commented 11 months ago

This is our feature proposal process: https://github.com/NuGet/Home/tree/dev/meta#nuget-proposal-process

Please create a design spec before changing code. This issue only talks about a very high level outcome (a single lock file for all projects in a solution/repo?). But there's no details about the contents of the lock file, or how the contents of the lock file will be maintained.

I haven't seen any comments in this issue so far acknowledging the existence of solution filters, or being able to unload projects in Visual Studio (not the same implementation as a solution filter, but from a feature spec point of view it's close enough). Also, on the command line if you run dotnet build, dotnet run or dotnet test on a project file, there is an implicit restore that only has context of that one project. Project references can be calculated, but it's still another example of when NuGet will only have a partial solution/repo project graph. This is something that I personally think is very important to address in a "single lock file" design. I really don't want to make perfect the enemy of good, but different customers have different workflows and therefore use NuGet in different ways.

Something else to consider is that dotnet publish -f tfm -r some_rid also does an implicit restore, setting MSBuild global properties, preventing NuGet from getting the "raw" project information. Although I think this already causes problems with the existing lock file feature. So it's best if a new design can either learn from the previous experience, or at least have the same mitigations (always use --no-restore with publish? it also means that customers must explicitly set their RIDs in the project file) before restore.

csproj files are just MSBuild files, and MSBuild files are just a scripting language. We don't need multiple repo changes to introduce a new variable (property or item). Well, technically we do to have dotnet/project-system flow the value to NuGet in Visual Studio, but all the command line experiences are fully self-contained. So assuming your comment about needing a change to the .NET SDK was just about adding a new property to the project file, no, that's not required.

However, some issues, particularly the dotnet publish and MSBuild global property challenges, are much wider in scope than just NuGet. However, some of those challenges would require fundamental changes to MSBuild (and maybe the .NET SDK), so aren't really feasible. Therefore, the design for a "single lock file" needs to be sufficiently pragmatic to work around MSBuild and .NET SDK constraints.

savornicesei commented 11 months ago

The way you use lock files in nodeJS projects is by calling npm ci which works only with the tree of deps from the lock file. In the same way, nuget and msbuild should feed from this global lock file completly, without doing any restores because everything is in that file. Should there be one lock file per sln or slv? Depends how big this file can grow and then we reach the land of tree shaking and dedup...

CEbbinghaus commented 7 months ago

Created an RFC based roughly on the suggestion. Please do leave your feedback there.

vanwx commented 2 months ago

ℹ️ I came to this thread trying to use Cache step with AzureDevops, same as @savornicesei post above

I may have a work around for the cache step by generating a hash of all *.csproj files (or package.lock.json files) and use that as a solution-level file before running dotnet restore, which reduces time restore packages.

Create the below file in template-folder folder.

# generate-package-hash.yaml
parameters:
- name: workingDirectory
  type: string
  default: $(System.DefaultWorkingDirectory)

steps:
- script: |
    # Navigate to the specified working directory
    cd ${{ parameters.workingDirectory }}

    # Find all *.csproj files in the working directory
    csproj_files=$(find . -name '*.csproj' | sort)

    # Initialize an empty variable to store the concatenated content
    concatenated_content=""

    # Loop through each .csproj file and concatenate its content
    for file in $csproj_files; do
      concatenated_content+=$(cat "$file")
    done

    # Generate a hash of the concatenated content (using SHA256)
    hash=$(echo -n "$concatenated_content" | sha256sum | awk '{print $1}')

    # Store the hash in a file named package.hash in the working directory
    echo "$hash" > package.hash

    # Output the content of package.hash for verification
    cat package.hash
  displayName: 'generate package hash'

Then we can reference the template in the azure-pipelines.yml

    steps:
    - checkout: self
      fetchDepth: 100
      clean: true

    - template: template-folder/generate-package-hash.yaml
      parameters:
        workingDirectory: $(workingDirectory)

    - task: Cache@2
      displayName: 'nuget cache'
      inputs:
        key: 'nuget | "$(Agent.OS)" | **/package.hash,!**/bin/**,!**/obj/**'
        restoreKeys: |
          nuget | "$(Agent.OS)"
          nuget
        path: '/home/AzDevOps/.nuget/packages/'
        cacheHitVar: 'CACHE_RESTORED'

I find out it wont' take much time to run DotNetCoreCLI@2 after the nuget cache step as it will take a few seconds anyways given that all the packages are restored in the global cache folder.

tebeco commented 2 months ago

can you tell me how hash work when you give it twice the same entry ? let's say something like

<PackageReference
  Include="Foo"
  Version="8.0.*" />

which resolve 8.0.1 today but will resolve 8.0.2 tomorrow

that's what CPVM can allow with allowing wildcard and that's also what a lockfile it supposed to be at solution level

a csproj is not a source of truth it's a matter of "desired state" and "result state"

the csproj is what you desire, it's the constraints

the lockfile is the result of applying the constraints

hashing csproj cannot be used for lockfile workaround

it also doesn't account for transitive change especially open one

vanwx commented 2 months ago

You're right, it won't work with PackageReference range like you said.

We use specific version in our csproj files so it may not be a issue. Plus, the dotnet restore step after that nuget cache would restore any missing packages, the new version 8.0.2 in your case.

Intention is to reduce package restoring time.

tebeco commented 2 months ago

it won't work for transitivity either

A > B > C > D A is your project B C D E are nuget package you don't own or version

if C remove it dependency from D but add E you'd miss that depending on how open is the constrains from B to C and you'd still use the old chain indefinitely until you touch something from you csproj

these scenario happened A LOT writhing msft package when they change azure identity or the sql client etc ...

glen-84 commented 2 months ago

I work on a solution that contains 185 projects and uses CPM. This means that updating most packages will result in changes to up to 185 package lock files. This is just madness.

Also, without a lock file we can't use floating versions, otherwise builds would not be repeatable.

Please assign a higher priority to this issue.

dd-inno commented 2 months ago

I work on a solution that contains 185 projects and uses CPM. This means that updating most packages will result in changes to up to 185 package lock files. This is just madness.

Also, without a lock file we can't use floating versions, otherwise builds would not be repeatable.

Please assign a higher priority to this issue.

Same here. We've got a Solution with around 100 Projects. Updating a package has become a nightmare.

NuGet / Home