dotnet / sdk

Core functionality needed to create .NET Core projects, that is shared between Visual Studio and CLI
https://dot.net/core
MIT License
2.66k stars 1.06k forks source link

dotnet-ef not available after successful restore #11820

Open solazs opened 4 years ago

solazs commented 4 years ago

We have an interesting issue on one of our build agents regarding dotnet ef cli tool. The error only occurs on one of our agents, however I cannot figure out what's off.

Running dotnet tool restore in my project directory outputs

Tool 'dotnet-ef' (version '3.1.4') was restored. Available commands: dotnet-ef

However when trying to run dotnet-ef I get an error:

$ dotnet ef
Run "dotnet tool restore" to make the "dotnet-ef" command available.
$ dotnet dotnet-ef
Run "dotnet tool restore" to make the "dotnet-ef" command available.
$ dotnet ef --version
Run "dotnet tool restore" to make the "dotnet-ef" command available.
$ dotnet dotnet-ef --version
Run "dotnet tool restore" to make the "dotnet-ef" command available.
$ dotnet tool run dotnet-ef --version
Run "dotnet tool restore" to make the "dotnet-ef" command available.
Usage: dotnet tool run [options] <COMMAND_NAME> [[--] <additional arguments>...]]

Arguments:
  <COMMAND_NAME>   The command name of the tool to run.

Options:
  -h, --help   Show command line help.
Additional Arguments:
  Arguments passed to the application that is being run.

Further technical details

dotnet-ef version: 3.1.4 dotnet --version: 3.1.300 Target framework: .NET Core 3.1.4 Operating system: Ubuntu 18.04.1 LTS

I'll attach a verbose output from dotnet tool restore separately.

solazs commented 4 years ago

dotnet tool restore -v diag output dotnet_tool_diag.log

ajcvickers commented 4 years ago

Moving this to the SDK repo since it doesn't seem specific to EF.

Piedone commented 4 years ago

This started to happen on our build agents (but not all it seems) two days ago too but for the tool smtp4dev: The exact same tool restore then tool run commands that worked before now fail. And actually it's the same for me locally too.

Did you find a solution @solazs?

Maybe worth noting that I'm talking about a local tool. If I do a complete reinstall of the tool in a new folder with the following commands it's still broken:

dotnet new tool-manifest
dotnet tool install Rnwood.Smtp4dev --version "3.1.0-*"
dotnet tool restore
dotnet tool run smtp4dev --db= "" --smtpport 7366 --urls http://localhost:8394/
solazs commented 4 years ago

@Piedone

Did you find a solution @solazs?

Unfortunately no, we're still impacted and I cannot reproduce the issue reliably. Global install (--global) might work in other environments, for us it's not feasible.

Piedone commented 4 years ago

Thanks for your quick reply. Same here.

I dug through the log and didn't find anything noteworthy. I see you're on *nix, I'm on Windows, so that's one thing that shouldn't make a difference I guess. Although you redacted your user name, can it be that this has to do with paths containing accented characters since we're both Hungarian with such a name? (The temp folder these commands use are under the user folder in Windows, which in my case is "Zoltán".)

solazs commented 4 years ago

Hey,

Nope, username is pretty plain, nothing Hungarian in there, but good thinking, nonetheless :) Project name is simple as well.

Also, we have multiple and supposedly identical agents running this build, but the error does not happen reliably, so I think this issue will have something to do with caches and tool install locations being funny (e.g. left in an inconsistent state).

Piedone commented 4 years ago

That can be. It seems that two of our build agents are affected, one isn't, though they're all running in subfolders on the same physical machine (so paths pretty much only differ in a number). No accented paths there anywhere though keyboard/date settings are Hungarian. However, changing the culture doesn't seem to help.

We can reproduce it reliably though.

It works for one of my colleagues but I don't yet see why (same SDK version, same culture, same Windows).

Piedone commented 4 years ago

Clearing NuGet caches with dotnet nuget locals all --clear doesn't resolve the issue BTW.

However, I did solve it: On Windows under %USERPROFILE%.dotnet\toolResolverCache\ you have subfolders with files that refer to the NuGet package cache folder. If those are wrong you'll get this error.

Because it turns out that this was caused by me changing the NuGet package cache folder's path (both in the user-level NuGet.config and in the environment variable). While I supposedly changed the path correctly and everything else worked it did break local tools. The tests failing in our CI environment were actually caused by something else but curiously both the failing tests and them starting failing coincided by the issue surfacing on my machine...

solazs commented 4 years ago

Confirm on the cache clearing not solving the issue. I have tried that prior to opening this issue.

Moreover content of ~/.dotnet/toolResolverCache is identical on our build agents. Edit: Identical, meaning it shouldn't cause one build to fail and another to succeed.

$ ls -al ~/.dotnet/toolResolverCache/
total 12
drwxr-xr-x 3 myuser myuser 4096 Apr  9 13:46 .
drwxr-xr-x 5 myuser myuser 4096 Jun 19 16:17 ..
drwxr-xr-x 2 myuser myuser 4096 Apr  9 13:46 1
$ ls -al ~/.dotnet/toolResolverCache/1/
total 12
drwxr-xr-x 2 myuser myuser 4096 Apr  9 13:46 .
drwxr-xr-x 3 myuser myuser 4096 Apr  9 13:46 ..
-rw-r--r-- 1 myuser myuser  713 Jun 25 14:04 dotnet-ef
augustoproiete commented 3 years ago

@solazs Did you ever get to the bottom of this?

dug9000 commented 3 years ago

We are seeing this same issue on Windows 10 pipeline builds as well. Currently we are doing the following to workaround it:

    if (HasEnvironmentVariable("USERPROFILE") && DirectoryExists(EnvironmentVariable("USERPROFILE") + "/.dotnet/toolResolverCache"))
    {
        DeleteDirectory(EnvironmentVariable("USERPROFILE") + "/.dotnet/toolResolverCache", new DeleteDirectorySettings { Recursive = true, Force = true});
        DotNetCoreTool("tool restore");
    }

But this kind of defeats the purpose of having a toolResolverCache. If the toolResolverCache can't be done reliably maybe this caching should just be removed from dotnet.

solazs commented 3 years ago

@solazs Did you ever get to the bottom of this?

Nope, we blacklisted the affected build agents from the problematic build. Did not receive complaints since.

RichardJFoster commented 3 years ago

I ran into something very similar recently and it came down to the user context that the build agent was running in. This case https://github.com/dotnet/sdk/issues/16165 describes the pit I fell into quite nicely.

samegutt commented 1 year ago

This is how the dotnet tool becomes unavailable after dotnet tool restore in my Azure Devops pipeline:

TL;DR: The tool specific file under <userprofile>\.dotnet\toolResolverCache\1 refers to files or folders that no longer exists.

  1. A build is started on an agent running as user buildsvc.
  2. The pipeline runs task UseDotNet. This changes where the nuget packages/dotnet tool assemblies are stored to the working folder of the agent.
  3. The pipeline runs dotnet tool restore. Since there is no corresponding tool described in the c:\users\buildsvc\.dotnet\toolResolverCache\1 folder, the tool is downloaded to the agent working folder (c:\w\1\.nuget\) and the download location is added to the PathToExecutable node in the tool specification file.
  4. The build then successfully executes the restored dotnet tool.
  5. A new build is started, removing all files from the working folder. This also removes the dotnet tool assemblies that was downloaded to the working folder mentioned in step 3
  6. The pipeline runs dotnet tool restore. Now the corresponding tool is described in c:\users\buildsvc\.dotnet\toolResolverCache\1, so the tool is not downloaded again. The path to the assemblies is pointing to a now empty folder in the agent working directory. Still, the output of the restore command is that the restore was successful.
  7. The pipeline tries to run the dotnet tool, and the tool is not found.
akindle commented 3 weeks ago

it's proving difficult to find the freshest, most relevant github issue for this bug, but we've seen an enormous uptick in issues with dotnet tool restore in the past week and especially 48 hours that I believe follow the pattern seen here (or possibly in the various referenced issues):

00:00:58 
    Installing cake.tool
00:01:02 
    Tool 'cake.tool' (version '4.0.0') was restored. Available commands: dotnet-cake
00:01:02 
    Tool 'nuke.globaltool' (version '7.0.6') was restored. Available commands: nuke
00:01:02 
    Restore was successful.
00:01:03 
    Running build script...
00:01:03 
    & dotnet cake "C:\BuildAgent\work\b2fcdbe1759cb61d\Build\build.cake" <args>
00:01:03 
    Run "dotnet tool restore" to make the "dotnet-cake" command available.
00:01:06 
  Error message: Run "dotnet tool restore" to make the "dotnet-cake" command available.

we've been running this in CI on ephemeral build agents for months without issue, and only recently began experiencing an uptick in failures like this with no changes at all on the ephemeral build agents. We've explored a variety of updates to the build agents' base VM image to address this that have seemingly only made things worse (i.e. this issue more frequent).

1) is this the best issue in the dotnet repo for tracking restore problems? 2) any guesses why this would fail intermittently? 3) are there any robust work-arounds?