NuGet crashes when v3 server returns items with invalid values

ghost commented 4 years ago

Using NuGet.exe, multiple versions, but present in 5.8.0.6854.

We are using a private server that I have no administrative access to (I believe it's proget). I can only provide information about what the client sees, but when we use the v3 protocol the server responds with no error for missing packages. It returns this:

{"count":1,"items":[{"count":0,"items":[],"parent":"http://private.server/nuget/feed/v3/registrations-gz/robblerobble/index.json","lower":null,"upper":null}]}

when querying for the bogus package "robblerobble". This causes the NuGet command to throw an exception when installing any package that is not hosted on the private server, because it tries to parse the upper and lower version strings. Because of this error, NuGet.exe will not continue and try to install the package from the next repository in the list.

I realise this could be dismissed as a problem with the server, but it can also easily be worked around in the client, and since I do not run the server, I cannot log a meaningful support incident about that side of the problem. I am happy to do the work to add a test case and fix, but the first step is to find out whether it would even be accepted or would just be a waste of time.

Here's an example log with the bogus package name:

Installing package 'robblerobble' to 'k:\'.
  CACHE http://private.server/nuget/feed/v3/registrations-gz/robblerobble/index.json
  GET https://api.nuget.org/v3/registration5-gz-semver2/robblerobble/index.json
  CACHE https://www.nuget.org/api/v2/FindPackagesById()?id='robblerobble'&semVerLevel=2.0.0
  NotFound https://api.nuget.org/v3/registration5-gz-semver2/robblerobble/index.json 1102ms
  NotFound https://api.nuget.org/v3/registration5-gz-semver2/robblerobble/index.json 1102ms
  NotFound https://api.nuget.org/v3/registration5-gz-semver2/robblerobble/index.json 1102ms
  NotFound https://api.nuget.org/v3/registration5-gz-semver2/robblerobble/index.json 1102ms
  NotFound https://api.nuget.org/v3/registration5-gz-semver2/robblerobble/index.json 1102ms
  NotFound https://api.nuget.org/v3/registration5-gz-semver2/robblerobble/index.json 1102ms
  NotFound https://api.nuget.org/v3/registration5-gz-semver2/robblerobble/index.json 1102ms
  NotFound https://api.nuget.org/v3/registration5-gz-semver2/robblerobble/index.json 1102ms
  NotFound https://api.nuget.org/v3/registration5-gz-semver2/robblerobble/index.json 1102ms
An error occurred while retrieving package metadata for 'robblerobble' from source 'private'.
  Value cannot be null or an empty string.
  Parameter name: value
NuGet.Protocol.Core.Types.FatalProtocolException: An error occurred while retrieving package metadata for 'robblerobble' from source 'private'.
 ---> System.ArgumentException: Value cannot be null or an empty string.
Parameter name: value
   at NuGet.Versioning.NuGetVersion.Parse(String value)
   at NuGet.Protocol.RegistrationUtility.<LoadRanges>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at NuGet.Protocol.ResolverMetadataClient.<GetDependencies>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at NuGet.Protocol.ResolverMetadataClient.<GetRegistrationInfo>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at NuGet.Protocol.DependencyInfoResourceV3.<ResolvePackages>d__5.MoveNext()
   --- End of inner exception stack trace ---
   at NuGet.Protocol.DependencyInfoResourceV3.<ResolvePackages>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at NuGet.PackageManagement.NuGetPackageManager.<GetLatestVersionCoreAsync>d__96.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at NuGet.PackageManagement.NuGetPackageManager.<>c__DisplayClass94_1.<<GetLatestVersionAsync>b__3>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at NuGet.PackageManagement.NuGetPackageManager.<GetLatestVersionAsync>d__94.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at NuGet.PackageManagement.NuGetPackageManager.<GetLatestVersionAsync>d__93.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at NuGet.CommandLine.InstallCommand.<InstallPackageAsync>d__38.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at NuGet.CommandLine.Command.Execute()
   at NuGet.CommandLine.Program.MainCore(String workingDirectory, String[] args)

nkolev92 commented 4 years ago

Hey @LoadEffectiveAddress,

Unfortunately given that the server is not conforming to the protocol, by adding a workaround on the client side, that can be interpreted as adopting that as part of the protocol. https://docs.microsoft.com/en-us/nuget/api/registration-base-url-resource#registration-page-object

Calling on a couple of other folks on the team to chime in here, @joelverhagen, @zivkan @loic-sharma

fyi @JonDouglas @aortiz-msft

joelverhagen commented 4 years ago

Agreed @nkolev92, one of the main purposes of publicly documenting the V3 protocol was so that server implementations (of which there are MANY, like dozens at this piint) align to a single protocol with minimal (hopefully zero) quirks. This also allows alternate client implementations as well. All goodness. It would be very hard for the client implementation to handle all quirks of all server implementations, pretty much unsustainable from an engineering perspective.

If the protocol specification is unclear in this area then we can definitely take it as a documentation bug and beef up the wording in this case.

We have contacts on the ProGet side so perhaps we can reach out to them about this particular bug stating that it's causing our joint customers some pain. /cc @jcjiang, what do you think about spinning up a parallel thread with ProGet for this issue after we get more details?

@LoadEffectiveAddress, when you say "bogus" do you mean a package ID that never existed on the source? Or perhaps it's a stranger case where a package existed but then was deleted?

ghost commented 4 years ago

By 'bogus' I just mean a package that never existed anywhere (to ensure server-side caching or existence on another source wasn't a contributor).

I appreciate the responses, and hope it leads somewhere positive with the proget team, but sadly I won't be able to answer any questions about the configuration of the server. I will be happy to supply information that can be gleaned from the server as a user though.

ghost commented 4 years ago

I have not read the entire spec, but the linked page does not appear to say anything about what should be returned when the package requested does not exist. At least it would be nice to state that the items array should be empty, or an explicit error should be returned instead.

joelverhagen commented 4 years ago

@LoadEffectiveAddress, you're totally right about the lack of clarity in the documentation. Expected behavior when a package does not exist at all on the source is that the package metadata resource aka the registration resource returns a 404 for the index.json document.

zivkan commented 4 years ago

I've only taken a superficial glance over this issue, so I could misunderstand something. But it seems reasonable to me for the client to treat a response with zero items as a 404. Both semantically mean there's no packages with that package id. Returning a 404 is may still preferable, because client doesn't cache 404s, but it will cache an empty registration page, so if a customer uploads a package, the client will not keep assuming the package doesn't exist for 30 minutes.

But this doesn't seem like a regression, so even if client starts supporting it in new versions, the feed still won't work correctly with old clients. And if client does start supporting it, then new server implementations might accidentally start depending on this without knowing they're excluding all old clients from working correctly. That's a reason not to fix it.

In conclusion I have no idea if we should fix it or not. Feels like client should be more tolerant to valid responses with the same semantic meaning, but it's less likely that servers will support older clients.

ghost commented 4 years ago

I believe if the server returned a top-level empty response, that would still work as desired (i.e. {count: 0, items: []}). It's only a problem because they return 1 top-level item, and that item has only default values for each required property. The client attempts to parse the upper and lower versions of that entry without checking first if it refers to any items at all (e.g. doesn't check if count is zero, or the items array is []).

It's hard to say it definitely violates the spec as it is now, but I can completely understand calling it poorly-formed and deciding not to support it.

ghost commented 4 years ago

Having slept on this, I think I'd like to make a second push to reconsider the problem. I understand not wanting to accept bad actors for fear of appearing to support broken feeds, but the real issue for the end-user is that the client crashes because of this, and cannot succeed at its task. It would be less frustrating if an error message was logged ("Source {x} has provided a malformed response, please contact your provider") and the client instead moved on to the next source as if the bad source wasn't in the list.

"Be conservative in what you send, be liberal in what you accept" and all that. Is there additional harm in ignoring a non-compliant implementation?

joelverhagen commented 4 years ago

It's up to the client team, for sure! I'm perhaps a purest at times and it shouldn't get in the way of customer success, as you said.

There are two places to fix this in my mind. On the client or on the server. We do have a line of communication with both. Client, here and server via email. We have contacts on the ProGet side and in fact will be meeting with them soon to discuss OData deprecation. Maybe we can add this bug to the agenda (/cc @jcjiang). In my opinion we should get both sides involved and drive for the best solution across the board. It may be quickest to get the fix on the server side, perhaps deployment of that is at faster cadence than NuGet client. Or perhaps it's quickest to do it client side.

Fixing it server side also fixes it for older client versions, which is a plus. That being said, old versions of the server will still be broken and a client fix would address that. Not sure which gets updated quicker, on average.

If we do take it on the client side, we should do so with a clear sense of what constitutes as a good fix for the client, and a good fix for the server. One of the main problems with the V2 protocol (predecessor to V3) is that it was never documented and had MANY quirks. For example, look at Paket's implementation of the V2 protocol: https://github.com/fsprojects/Paket/blob/b6e6491fc9cc2eb9caa5a39f9755eed2252ceaff/src/Paket.Core/Dependencies/NuGetV2.fs#L272-L338 -- not ideal! I'm not sure how other clients (Paket, etc) will handle this particular bug. That's perhaps another interesting data point.

I would say from a REST perspective a 404 is more proper (e.g. "package not found") but that sort of conversation gets philosophical very fast!

edit: great discussion by the way. thanks for opening this up, @LoadEffectiveAddress!

nkolev92 commented 4 years ago

Thanks for the discussion all!

Personally I think, I want to reiterate my comments from earlier. Fixing this on the server side will impact all clients, and it really creates fewer questions about what works. Fixing it on the server side also doesn't introduce any additional gotchas that pertain to caching etc.

I do think we need to dig more to understand the impact. That'll help us make the best decision.

How can we get into this scenario server side?
How often does this happen?
Is it a new problem on proget side? Are they aware?
If it's not a new problem, why haven't we seen this before?

ghost commented 4 years ago

How often does this happen?

I set up a brand new proget server with their most recent docker image, and the problem was immediately reproducible with an empty feed.

If it's not a new problem, why haven't we seen this before?

I think a common use-case for proget is as a local caching server. When the feed is configured with a nuget.org connector, it returns the nuget.org response for any valid packages not hosted locally. In this configuration, the only time you see the error is when you specify a package that doesn't exist on either proget or nuget.org. That's my best guess at least.

ghost commented 4 years ago

I forgot to say: since I am now armed with more information about a failing server, I can try logging a support incident with Inedo. I still don't understand why it's desirable to leave a crashing bug in the client, but there's no need to go round in circles over it.

aortiz-msft commented 4 years ago

I agree with @LoadEffectiveAddress that NuGet should not crash and instead handle the bad response gracefully. Let's just this issue to track that.

@joelverhagen - How does NuGet.org respond to this scenario? Do we need an issue for NuGet.org or for adding this case to the protocol documentation?

nkolev92 commented 4 years ago

Re: Handling gracefully.

I think we're intentionally failing here. FatalProtocolException wraps all protocol failures. If a package version is not parsable for example, FatalProtocolException would be what all failures are wrapped into. It's really the stack trace being displayed that's meh in this case :)

So imho, bad feed = failure is good.

ghost commented 4 years ago

Some good news: Inedo have acknowledged the issue and expect it to be fixed in their next release.

NuGet / Home

NuGet crashes when v3 server returns items with invalid values #10107