PowerShell / PowerShellGallery

228 stars 65 forks source link

PSGallery Search() Function doesn't page unless the top is 100 above the result count #241

Open JustinGrote opened 1 year ago

JustinGrote commented 1 year ago

Prerequisites

Steps to reproduce

So I noticed this while experimenting with some high performance queries against PSGallery, maybe someone can clarify for me what's going on or if this is in fact an actual issue.

Say you do Find-PSResource 'Az*' which results in the following query according to fiddler:

https://preview.pwsh.gallery/api/v2/Search()?$filter=IsLatestVersion&searchTerm='Az%2A'&targetFramework=''&includePrerelease=false&$skip=0&$top=6000&semVerLevel=2.0.0

This query actually returns 1131 objects, but it also has a nextLink embedded that only skips 100 image image

If you set the top results to 1230 or lower (1131 + 99), everything gets returned in a single request with no nextLink. If you set it to 1132, then you get the page result and what look to be mostly duplicates in the data. If you set it to 6000, you seem to get mostly duplicate data 6 times. If you set it to 4000, you seem to get it 4 times. Since the default in PowerShellGet is 6000, this is why Find-PSResource Az* is so dog slow. By contrast, if I make a custom Cmdlet that calls SearchAsync with a smaller Maxresult size, it's reasonably fast (there's some sort of artificial delay in v2 SearchAsync that doesn't exist in v3): image

Change the maxcount to 4000, and it takes nearly 3 seconds to run, with the skip only skipping 100 each time but returning 1132 - skip results each time: image

So something is broken in the server-side logic for the nextLink. I would expect that the server would have a certain predefined batch size it is willing to operate with (since you cannot specify this with the Search Odata query)

*It only happens with large queries, hard to tell but seems to be >500 results. Anything with less results always returns correctly. A query with 597 results `Be` is affected**

Temporary Workaround

Limit the PSGetv3 search calls to 500 results, and warn if that exact number is hit indicating more results may be present, and add a configurable -MaxResults parameter, similar to how exchange works with its -ResultSize parameter

Expected behavior

Query would return in batches of a pre-determined server limit, e.g. 1000, and each skip would skip at that interval

Actual behavior

Skip always skips at intervals of 100 but still returns full-record data sets, leading to massive data deduplication and slow queries, which gets magnified the higher resultsize is set to.

Error details

No response

Environment data

7.2.3

Visuals

No response

JustinGrote commented 1 year ago

@SydneyhSmith I created this in the wrong repo, please move it to Powershell/PowershellGallery. Thanks

alerickson commented 1 year ago

@JustinGrote we're working on moving off of the NuGet client APIs (see: PowerShell/PowerShellGet#653) and I think that should resolve this issue PowerShellGet side. We'll look into why the server is returning these results, specifically the duplicate results. I'll move this over to the Gallery repo so we can track this issue there.