NuGet / NuGetGallery

NuGet Gallery is a package repository that powers https://www.nuget.org. Use this repo for reporting NuGet.org issues.
https://www.nuget.org/
Apache License 2.0
1.56k stars 644 forks source link

The correct ID casing should be displayed for the selected version #3349

Closed joelverhagen closed 3 years ago

joelverhagen commented 8 years ago

Problem

It seems like the casing of a package ID in the package web page is determined by the first version's ID casing. For example, consider the "sqlite" package ID.

ID Version
sqlite 3.8.4.2
SQLite 3.9.1-test
SQLite 3.12.2-alpha
SQLite 3.12.2
SQLite 3.12.3
SQLite 3.13.0

If you go to a specific version on NuGet.org (e.g. https://www.nuget.org/packages/sqlite/3.13.0) the ID casing in the page's heading and example install command is not SQLite. It is sqlite.

The package author's intended case is apparent in the .nuspec contained in the .nupkg.

image

Package IDs are case insensitive, but the package author's intended case is still important.

Here is a list of popular packages affected by this bug:

LatestId TotalDownloads
Swashbuckle.AspNetCore.SwaggerUI 96334268
Npgsql 54283023
Hangfire.Core 30446646
SendGrid 25779632
Microsoft.Data.Sqlite 20418313
SharpCompress 19417928
NCrontab.Signed 17268104
Hangfire.SqlServer 17034720
StructureMap 14500086
Serilog.Sinks.Elasticsearch 14448135
morelinq 14230167
iTextSharp 13785720
NCrontab 13670550
Hangfire 12792338
Refit 12481531
Topshelf 11030897
MongoDB.LibMongocrypt 10757761
NServiceBus 9493951
NuGet.Core 7609245
PDFsharp 7562855

Solution

  1. Add a nullable Id column to the Packages table.
  2. On package upload, store the package Id in the new column
  3. Anywhere the package ID is displayed, display the Package.Id value if it is non-null, otherwise display the PackageRegistration.Id value.
  4. Update reflow to set the Package.Id value if it is null.
  5. Update Db2AzureSearch to use the Package.Id value if it is non-null, otherwise use the PackageRegistration.Id value. See: https://github.com/NuGet/NuGetGallery/issues/3349#issuecomment-730649782
scottbommarito commented 7 years ago

Unfortunately, this is not trivial. Here is a portion of a discussion from https://github.com/NuGet/NuGet.Services.Metadata/pull/197.

There is a well known ID casing per version. This is defined by the package's .nuspec. [The] casing of IDs in V2 API (and everywhere in the gallery) is determined by the first version uploaded with that ID. This is what defines the Id string the PackageRegistrations table. This field is what is JOINed into the query that produces non-hijacked V2 results (which include that original Id string repeated over and over for all versions of that ID).

Ideally the Package table would have a package ID column which has the correct casing for that specific version.

skofman1 commented 5 years ago

How about we modify the casing in PackageRegistration table every time a new version is pushed?

shishirx34 commented 5 years ago

Or allow modifying of the case in PackageRegistration table, even after the fact? Asking users to upload a new package version to just to modify the case isn't ideal either.

joelverhagen commented 5 years ago

How about we modify the casing in PackageRegistration table every time a new version is pushed?

I think the latest version not the latest package pushed chronologically should define this. Operating on anything but SemVer order is a strange thing for a package manager to do.

Or allow modifying of the case in PackageRegistration table, even after the fact? Asking users to upload a new package version to just to modify the case isn't ideal either.

This would allow deviation from the value in any .nuspec. Also, this would be inconsistent with V3.

skofman1 commented 5 years ago

@joelverhagen , if I understand you correctly, you support the following: When a new package version is pushed, update PackageRegistration table id casing, only if the new version is "latest" according to SemVer.

joelverhagen commented 5 years ago

That's one approach, yeah. There are multiple definitions of latest (4 in fact...) so we would have to pick one or track them all. If we pick one then we should probably pick the "latest" definition that package details page uses which is latest Stable including SemVer 2.0.0.

agr commented 5 years ago

I've seen a couple of requests to change the casing of the packages (to align with the latest version) recently. How about storing all uploaded casing variants somewhere in DB and allow user to choose which one to use to represent the package? Always using the latest though aligns the hijacked search output with the package details page.

joelverhagen commented 5 years ago

Letting the user choose could be implied by latest version. I think choosing a non-latest version is an option that is a) kind of confusing and b) not required for a minimal fix. Do any users actually want to use an old casing rather than just making the new casing fixed in the latest version of the package.

But yeah, I think to fix this we need at least some more information in the database.

My original suggestion of storing it per version is more analogous to V3 and does not have any weird parallelism issues that writing to the package registration record has.

loic-sharma commented 4 years ago

Here's another instance where this feature could've helped: https://github.com/NuGet/NuGetGallery/issues/8069

joelverhagen commented 4 years ago

Another problem with not storing the correct ID casing per package version is that our Db2AzureSearch job which populates the search index based of database state uses the same casing for all package versions. This can lead to cases where the search document produced from the database differs from the search document produced via the catalog. This is a bug and is particularly problematic since nuget.org search tokenizes on camel-case. For example "streamdecksharp" vs. "StreamDeckSharp" are tokenized differently.

This situation has been improved a bit with https://github.com/NuGet/NuGet.Jobs/pull/925 since all camel-case tokens are now required in the matched document but it does not completely solve the problem.

We should store each individual version's casing in the Packages table and use that for both display and in Azure Search index population.

agr commented 3 years ago

We are doing a minimal update this sprint.

joelverhagen commented 3 years ago

This fix is deployed and verified in PROD. However not all past data has been fixed up since this a cosmetic issue and not many users have been affected.

If you are facing this problem, you can resolve the situation on your package ID by pushing a new latest version of your package. All new packages will have their exact package ID casing recorded and displayed per version.