MicrosoftDocs / PowerShell-Docs

The official PowerShell documentation sources
https://learn.microsoft.com/powershell
Creative Commons Attribution 4.0 International
1.93k stars 1.55k forks source link

`Select-Object` doesn’t mention if the result maintains the order #11099

Closed philcerf closed 2 months ago

philcerf commented 2 months ago

Prerequisites

Links

https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-object

Summary

Hey.

When doing something like:

$processes = Get-Process
$copy = $processes | Select-Object -Property Id,Path

(where every object in the array $processes has an Id and a Path property, so all elements are guaranteed to be selected) it seems that Select-Object maintains the order of elements, i.e. $copy will have the same order than $processes, just with fewer and statically copied properties.

However, the docs don’t really specify that maintaining the sort order (of selected objects) is guaranteed behaviour or just current implementation.

Would be nice if it could specify whether or not this is guaranteed behaviour.

Thanks, Phiippe

Details

No response

Suggested Fix

No response

sdwheeler commented 2 months ago

The name Select-Object is not really accurate. The cmdlet selects properties of objects it receives through the pipeline. The order of objects received is determined by the sending command. Select-Object can't change that order. The order of the properties selected is determine by the order you list them. If you use wildcards to select properties, the wildcard patterns are resolved in the order that the properties occur on the object in the pipeline.

philcerf commented 2 months ago

Hey.

Two things:

So that's what should IMO be documented/specified, telling that it doesn't change the order (apart from filtering out) as received by the pipe (and that this is guaranteed behaviour).

sdwheeler commented 2 months ago

Select-Object doesn't cache objects before emitting them to the pipeline. That's not how the PowerShell pipeline works. See https://learn.microsoft.com/powershell/module/microsoft.powershell.core/about/about_pipelines#one-at-a-time-processing.

There are commands that must collect all the input from the pipeline before emitting output. For example, Sort-Object can't perform the sort until it has all the data.

philcerf commented 2 months ago

Select-Object doesn't cache objects before emitting them to the pipeline.

Yeas I know that... but that's not what I mean.

Take e.g. insertion sort order of dictionaries in Python. Only starting with IIRC Python version 3.7, it became a guaranteed feature of the language, that dictionaries have their elements ordered the way they're inserted. But actually this was already the case since some versions earlier - though there it wasn't defined so yet.

Right now, Select-Object says nothing about whether it is defined to keep the order as it gets it from the pipe.
It merely technically happens to do so, but there's nothing that would prevent upstream from changing that if they think some other implementation would have benefits.

So right now it's not something one can really rely on.

michaeltlombardi commented 2 months ago

The Select-Object cmdlet is, functionally, a mapping function, like Where-Object is a filtering function. Across every implementation of these functions in languages I've used, the expectation is that mapping and filtering functions preserve the order of the input array, even if the data is munged (for mapping functions) or non-matching items in the input array are removed (for filtering functions).

Only when a function indicates that the return order is randomized have I seen that behavior, because it violates user expectations.

Both Select-Object and Where-Object behave in conformance to user expectation for array item ordering - they preserve the order of items in the input array for the output.

Specifying that these cmdlets conform to standard implementation expectations only raises questions about all cmdlets that process arrays of input, and would require updating the reference documentation of every such command for consistency to avoid begging questions about ordering.

We only document ordering behavior when it violates standard implementation expectations or naive user expectations.

The contract API for these cmdlets processing an input array and returning an output array that preserves the input order has been stable from the first release of the language and no change to it has been proposed. Moreover, if one was proposed as an RFC, it would likely be rejected due to the wide-scale impact and backwards-breaking behavior it would introduce.