Closed joelverhagen closed 3 years ago
Unfortunately, this is not trivial. Here is a portion of a discussion from https://github.com/NuGet/NuGet.Services.Metadata/pull/197.
There is a well known ID casing per version. This is defined by the package's .nuspec. [The] casing of IDs in V2 API (and everywhere in the gallery) is determined by the first version uploaded with that ID. This is what defines the
Id
string thePackageRegistrations
table. This field is what is JOINed into the query that produces non-hijacked V2 results (which include that originalId
string repeated over and over for all versions of that ID).Ideally the Package table would have a package ID column which has the correct casing for that specific version.
How about we modify the casing in PackageRegistration table every time a new version is pushed?
Or allow modifying of the case in PackageRegistration table, even after the fact? Asking users to upload a new package version to just to modify the case isn't ideal either.
How about we modify the casing in PackageRegistration table every time a new version is pushed?
I think the latest version not the latest package pushed chronologically should define this. Operating on anything but SemVer order is a strange thing for a package manager to do.
Or allow modifying of the case in PackageRegistration table, even after the fact? Asking users to upload a new package version to just to modify the case isn't ideal either.
This would allow deviation from the value in any .nuspec. Also, this would be inconsistent with V3.
@joelverhagen , if I understand you correctly, you support the following: When a new package version is pushed, update PackageRegistration table id casing, only if the new version is "latest" according to SemVer.
That's one approach, yeah. There are multiple definitions of latest (4 in fact...) so we would have to pick one or track them all. If we pick one then we should probably pick the "latest" definition that package details page uses which is latest Stable including SemVer 2.0.0.
I've seen a couple of requests to change the casing of the packages (to align with the latest version) recently. How about storing all uploaded casing variants somewhere in DB and allow user to choose which one to use to represent the package? Always using the latest though aligns the hijacked search output with the package details page.
Letting the user choose could be implied by latest version. I think choosing a non-latest version is an option that is a) kind of confusing and b) not required for a minimal fix. Do any users actually want to use an old casing rather than just making the new casing fixed in the latest version of the package.
But yeah, I think to fix this we need at least some more information in the database.
My original suggestion of storing it per version is more analogous to V3 and does not have any weird parallelism issues that writing to the package registration record has.
Here's another instance where this feature could've helped: https://github.com/NuGet/NuGetGallery/issues/8069
Another problem with not storing the correct ID casing per package version is that our Db2AzureSearch job which populates the search index based of database state uses the same casing for all package versions. This can lead to cases where the search document produced from the database differs from the search document produced via the catalog. This is a bug and is particularly problematic since nuget.org search tokenizes on camel-case. For example "streamdecksharp" vs. "StreamDeckSharp" are tokenized differently.
This situation has been improved a bit with https://github.com/NuGet/NuGet.Jobs/pull/925 since all camel-case tokens are now required in the matched document but it does not completely solve the problem.
We should store each individual version's casing in the Packages
table and use that for both display and in Azure Search index population.
We are doing a minimal update this sprint.
This fix is deployed and verified in PROD. However not all past data has been fixed up since this a cosmetic issue and not many users have been affected.
If you are facing this problem, you can resolve the situation on your package ID by pushing a new latest version of your package. All new packages will have their exact package ID casing recorded and displayed per version.
Problem
It seems like the casing of a package ID in the package web page is determined by the first version's ID casing. For example, consider the "sqlite" package ID.
If you go to a specific version on NuGet.org (e.g. https://www.nuget.org/packages/sqlite/3.13.0) the ID casing in the page's heading and example install command is not
SQLite
. It issqlite
.The package author's intended case is apparent in the .nuspec contained in the .nupkg.
Package IDs are case insensitive, but the package author's intended case is still important.
Here is a list of popular packages affected by this bug:
Solution
Id
column to thePackages
table.Package.Id
value if it is non-null, otherwise display thePackageRegistration.Id
value.Package.Id
value if it is null.Package.Id
value if it is non-null, otherwise use thePackageRegistration.Id
value. See: https://github.com/NuGet/NuGetGallery/issues/3349#issuecomment-730649782