chocolatey / choco

Chocolatey - the package manager for Windows
https://chocolatey.org
Other
10.23k stars 897 forks source link

With S3bucket as Sleet source choco cannot list more than 30 packages #3502

Open prsshini opened 1 month ago

prsshini commented 1 month ago

Checklist

What You Are Seeing?

Our S3 bucket source contains 1612 packages with 194 folders (with different versions of nupkgs in each folder). Choco source is set as Sleet source, however, when running Chocolatey CLI, we get the following:

The threshold of 1,000 packages per source has been met. Please refine your search, or specify a page to find any more results.

PS C:\Users\vradhs2> choco source
Chocolatey v2.2.2
test - https://s3.amazonaws.com/bucket-name/index.json | Priority 0|Bypass Proxy - False|Self-Service - False|Admin Only - False.

Choco search command can only list 30 packages . Rest of the packages in the S3 bucket is not getting listed.

We tried

choco search --page=100

and

choco search --page-size=100

All of them have the same behavior to list only 30 packages.

Could you let me know if any choco settings that is preventing to list all the packages?

What is Expected?

To List all the 194 packages.

How Did You Get This To Happen?

choco source add -n test -s="test - https://s3.amazonaws.com/bucket-name/index.json"
choco search
package1
package2
...
package 30
30 packages found.
The threshold of 1,000 packages per source has been met. Please refine your search, or specify a page to find any more results.

The source contains 194 packages. And the source is an s3 bucket with packages pushed via Sleet.

Using chocolatey version 2.2.2

System Details

Installed Packages

N/A

Output Log

https://gist.github.com/prsshini/af28447fbce6196f36214515d45b5540

Additional Context

I am able to search a package from Sleet S3 source using the command:

choco search package100 --exact
package100

This is lists as expected.

@gep13 added:

See the comment here which provides more information about the investigation into this issue, as well as a plan for steps moving forward.

pauby commented 1 month ago

We only support the latest version for open-source. Please upgrade and update the issue description and logs.

Please also provide the installed packages list and the System Details and the configuration of Sleet.

Your logs don't contain index.json so the source you have provided in the description does not match what you say the source looks like (I understand it's an example but your logs already contain the actual source so there is no point anonymising it now).

prsshini commented 1 month ago

Hello @pauby , Thanks for your reply. I upgraded the choco version to the latest one. 2.3.0. And I still see the same issues. Choco search can only list 30 packages eventhough the sleet source contains more than 30 packages. I am runnning this in a windows server 2016. In the chocolatey logs, I do see the source is set with index.json. latest logs here 👍 https://gist.githubusercontent.com/prsshini/af28447fbce6196f36214515d45b5540/raw/58fd692accf76f7fbb955b5b91d1d895fdef5e81/latest%2520logs%2520with%25202.3.0

Sleet json config is used as

{ "sources": [ { "name": "Sleetfeed", "type": "s3", "path": "https://s3.amazonaws.com/mubucket/", "bucketName": "mybucket", "region": "us-east-1", "accessKeyId": "*", "secretAccessKey": "*****" } ] }

nupkgs are pushed to this bucket using the command Sleet Push d:/nupkgfile

pauby commented 1 month ago

Have you tried to query the source using nuget.exe? What as the result?

What happens if you use choco search --all-versions?

prsshini commented 1 month ago

Hi @pauby I just tried using nuget.exe and it cqan list fewer than 30 packages.. I also tried choco search --all-versions and its only listing all the versions of the first 30 packages.

pauby commented 1 month ago

I just tried using nuget.exe and it cqan list fewer than 30 packages..

Did you mean more than 30 packages? How many did it list? What command did you use?

and its only listing all the versions of the first 30 packages.

The 1000 limit is for package versions, nor packages. Worth --all-versions how many were shown in total?

prsshini commented 1 month ago

our packages have multiple versions. Fot eg., Package1 will have 10 versions and Package2 will have 20 version. So the total number of versions in all the 30 packages is what listed when I used choco search --all-versions. There are 452 packages in the first 30 packages. So, when I used choco search --all-versions, it listed 452 packages.

However the total packages in the Bucket is 192 and the total version is 1612.

But Choc search list only first 30 packages and choco search --all-versions lists only 452 packages which are the total of first 30 packages,

prsshini commented 1 month ago

I also tested with a new bucket which has only 32 packages with a total of 400 packages and I still go 1000 limit error. So choco somehow limits to list only the first 30 packages. Is there a configuration that I am missing to make it list everything? Choco config do not have any setting for page size of package list limit.

pauby commented 1 month ago

There is no configuration option you're missing.

Thanks for testing all of that. Leave it with me as we'll need to reproduce your setup and the issue to see what is going on.

prsshini commented 1 month ago

@pauby Thank you for your support. Just to reiterate the steps to reprouce. Generate a nupkg. Install sleet in your dev and save your sleet.json as below. { "sources": [ { "name": "feed", "type": "s3", "path": "https://s3.amazonaws.com/mybucket/", "bucketName": "mybucket", "region": "us-east-1", "accessKeyId": "yourkey", "secretAccessKey": "yourkey" } ] }

Sleet Init command to initialse your bucket. use the command "Sleet Push D:/nupkg-package" Likewise push more than 30 packages to your s3 bucket.

Set up your choco source as choco source add -n test -s="https://s3.amazonaws.com/mybucket/index.json"

Use command "Choco Search" and it will list only 30 packages with an message 30 packages found. The threshold of 1,000 packages per source has been met. Please refine your search, or specify a page to find any more results.

prsshini commented 3 weeks ago

Hi @pauby Any luck with this? were you able to reproduce this issue? Thanks!

gep13 commented 3 weeks ago

@prsshini thank you for bringing up this issue. I have done some investigation work, and I can report that I am able to reproduce this issue. Based on some discussions internally, we have decided on a path forward, which I wanted to lay out here.

To confirm what was done for our internal testing...

  1. Installed sleet on target machine
  2. Brought together a collection of packages (a total of 2545 unique package versions including pre-release packages, with 39 distinct packages)
  3. Ran the following script to bring these packages into the sleet instance (this loop was necessary, since pointing sleet at the full folder of nupkgs then complained about duplicate packages)
    Get-ChildItem "C:\temp\packages" -Filter *.nupkg | 
    Foreach-Object {
    Write-Host $_.FullName
    sleet push -s myLocalFeed $_.FullName
    }
  4. Then the sleet folder was then hosted on a web server (for the purposes of this test, the Express Visual Studio Code Extension was used)

Now that this was up and running, the following results were observed...

  1. Ran choco search chocolatey --source http://localhost/index.json --ignore-http-cache and only 30 packages were returned, when the expected number was 39. The default page size for choco.exe is 30, so it is expected that only 30 packages would have been returned.
  2. Ran choco search chocolatey --source http://localhost/index.json --ignore-http-cache --page-size=20 and 20 packages were returned, which worked as expected
  3. Ran choco search chocolatey --source http://localhost/index.json --ignore-http-cache --page-size=40 and only 30 packages were returned, when the expected number was 39.
  4. Ran nuget search chocolatey -Source http://localhost/index.json and only 20 packages were returned, when the expected number was 39. The default page size for nuget.exe is 20, so it is expected that only 20 packages would have been returned.
  5. Ran nuget search chocolatey -Source http://localhost/index.json -take 10 and 10 packages were returned, which worked as expected
  6. Ran nuget search chocolatey -Source http://localhost/index.json -take 40 and all 39 packages were returned

In summary, based on these tests, it would appear that nuget.exe is working correctly, and choco.exe is not working correctly. However, there are some technical details here that I think need to be explained, as it is a combination of the way that choco.exe is working, and how sleet is working, that cause the problem.

The first thing to point out is that the search in sleet, by default, doesn't actually do anything except return all the packages that exist on the static feed. It doesn't look at the incoming query parameters and filter the results to match what is requested. It simply returns all the packages. This is documented on the sleet GitHub repository, and a link is provided to this blog post to allow true search results to be returned. Am I right in saying that you aren't using the Sleet.Search package, and instead using the default sleet search? I am assuming that this is the case, as that is what matches the replication steps that I have done.

In principal, returning an unfiltered list of packages in each search query that is done would appear to give exactly what you are looking for, however, this causes problem in how Chocolatey CLI operates. Let me try to explain...

Due to this section of code, the maximum page size for a request from Chocolatey CLI is 30 packages. This is due to some historical problems with larger page sizes against Chocolatey Community Repository as well as some other Repository Managers, like Nexus. As a result, when you attempt to run the following command:

choco search chocolatey --source http://localhost/index.json --ignore-http-cache --page-size=40

There are actually two outgoing queries, which can be seen here:

image

The result of these two queries is that a total of 40 packages should be returned. However, since sleet is returning the exact same information from both queries, Chocolatey CLI actually only sees the first 30 packages, since the packages that are returned are essentially duplicates, and they are ignored.

When nuget.exe does the following command:

nuget search chocolatey -Source http://localhost/index.json -take 40

There is only 1 outgoing query:

image

Which means that nuget.exe can see all the packages, as there are no duplicates.

I hope this serves to illustrate what is going on here, if not, please let me know, and I can try to explain further.

In terms of what we plan to do to improve this experience...

  1. Allow the user to control the page-size directly, without automatically setting it to 30
  2. Output a warning when the user uses something other than the recommended 30
  3. Continue to output an error when user attempts to use a page-size greater than 100

Keep in mind, due to the upper limit of allowed packages in the response to a search query, as well as the upper limit on allowed page sizes, you may still not be able to return all packages from sleet. However, this is purely down to the responses that sleet is providing by default. The change here is to try to make things better, but it is not a guarantee that things will work as you want.

The best recommendation would be to introduce the Sleet.Search package, as mentioned earlier.

prsshini commented 3 weeks ago

@gep13 Thank you for your detailed response. Just to confirm your question, we are not using Sleet.Search and just using the default sleet search. So your assumption is right.

In terms to the fix that you are proposing, When you say page-size, it doesnt literally mean "page", correct? it means number of packages it can display in one query. If that assumption is right, why do you recommend having an error when page size is greater than 100?

When we use 2 sources, (primary and secondary) when the number of distinct packages from both the sources can exceed 100, isnt that a common business use case?

When do you think the proposed fixes be pushed?

In the meantime, we will try to explore Sleet.Search to see if it fits our need.

Thanks for your time and support.

gep13 commented 3 weeks ago

@prsshini said... In terms to the fix that you are proposing, When you say page-size, it doesnt literally mean "page", correct? it means number of packages it can display in one query.

Here, I am referring to the number of results that should be returned from the search query. For example, if I did:

choco search chocolatey --page-size=5

I would expect 5 results to be returned, even if there were more than 5 results available.

While this paging is typically done on the Repository Manager (but as described, Sleet isn't doing this), the truncation to that page size is also done on the client side, to make sure that the correct number of requested results are returned.

@prsshini said... If that assumption is right, why do you recommend having an error when page size is greater than 100?

The choco search command is a command that does a search. It is attempting to filter down the results of a query to a manageable amount of information. The search command is not intended to be a command that is used to enumerate through all the packages that are available on a given source. As such, in the 2.x release of Chocolatey CLI, we introduced a number of safe-guards to prevent it being used like this. That is why you will see a maximum of 1000 package returned from each source that is queried, and a maximum of 100 for the page size.

@prsshini said... When do you think the proposed fixes be pushed?

I have added this issue to the 2.4.0 milestone, which will be the next release of Chocolatey CLI, but I can't offer any indication on when this will be released.

@prsshini said... In the meantime, we will try to explore Sleet.Search to see if it fits our need.

I believe that will be the best course of action, given your use case here.