PowerShell / PowerShell

PowerShell for every system!
https://microsoft.com/PowerShell
MIT License
43.56k stars 7.06k forks source link

I want return an empty array in a function, but it return a null value. #21547

Closed abgox closed 2 weeks ago

abgox commented 2 weeks ago

Prerequisites

Steps to reproduce



test3

Expected behavior

- It should return an empty array.

Actual behavior

- It return null.

Error details

No response

Environment data

Name                           Value
----                           -----
PSVersion                      7.4.2
PSEdition                      Core
GitCommitId                    7.4.2
OS                             Microsoft Windows 10.0.26100
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Visuals

rhubarb-geek-nz commented 2 weeks ago
function test-function {
        return New-Object System.Collections.ArrayList
}

$arr = test-function

$arr.GetType()

gives

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     ArrayList                                System.Object
abgox commented 2 weeks ago
rhubarb-geek-nz commented 2 weeks ago
  • But, why @() be a problem?

You have my sympathy, PowerShell can be infuriating with lists of zero or one items where the list can magically evaporate.

My best explanation is PowerShell always tries to simplify, often turning lists of one item into just one item. This can be problematic in that you can't just write code dealing with lists, you have to test for the cases of none, one and some. Also in key places classic PowerShell differs from the behaviour of PowerShell core. Hence parameters like -AsArray, -NoEnumerate etc trying to undo what PowerShell insists on doing.

mklement0 commented 2 weeks ago

There is no magic, only an initially perhaps surprising behavior that is fundamental to PowerShell:

In the PowerShell pipeline - which is invariably involved when producing output from a command (function, script, cmdlet, script block) - (most) enumerables (such as arrays, ArrayLists, ...) are auto-enumerated; that is, an enumerable's elements are sent one by one to the success output stream.

In other words: Unless you take extra steps (see below), the original enumerable is - predictably - lost, and in the success output stream you cannot tell the difference between outputting a single-element enumerable and the one element it contains, given that in both cases it is only the latter that is sent to the output stream.

An empty enumerable - such as in your case - sends "nothing" to the success output stream (pipeline), which is technically the [System.Management.Automation.Internal.AutomationNull]::Value singleton, which in effect behaves like $null in expression contexts and argument-based parameter binding (such as your case).

The success output stream is an open-ended stream of objects that itself has no notion of an array or a similar data structure: the objects in it can - and often are - processed one by one, as they are received, in which case the question of how to collect them for later processing doesn't arise.

Collecting stream output of necessity comes into play when you assign to a variable (e.g., $arr = test-function), or make command output participate in a larger expression (e.g., 'foo' + (test-function)), including use of $(...) and @(...) (except with array literals). Collecting a single object in the stream causes it to be collect as-is. It is only if two or more objects in the stream that a list-like data type is invariably needed for collection, in which case PowerShell invariably creates an [object[]]-typed array. For the reasons explained above, this array is unrelated to any originating enumerable type, which never participated as itself in the pipeline.

To send an instance of an enumerable type itself, as a whole to the success output stream, you must prevent auto-enumeration:

In other words: The following techniques all work to output an empty array as a whole from your function:

# Using New-Object
function test-function { New-Object -object[] 0 }

# Using Write-Output -NoEnumerate
function test-function { Write-Output -NoEnumerate @() }

# Using a transitory single-element helper array wrapper
function test-function { , @() }

It is worth noting that auto-enumeration is a core PowerShell feature that you should generally not deviate from, especially in public-facing functions / cmdlets / scripts.

On a higher level of abstraction, one of PowerShell's core strength is its consistency, of which consistent behavior in output streams / in the pipeline is one aspect.

To put it in concrete terms: Users justifiably expect commands to output objects one by one rather than outputting list-like containers as a whole, especially given that the latter behavior will not behave as expected in the pipeline; e.g.:

# Expected, auto-enumerating streaming behavior (element-by-element streaming).
# Where-Object's script block is invoked once for each element.
# -> 2, 3 
& { @(1, 2, 3) } | Where-Object { $_ -ge 2 }

# Unusual, array-as-a-whole output behavior.
# !! -> @(1, 2, 3) 
# !! Where-Object only receives *one* input object, which is the *array* as  while, in which
# !! case -ge acts as an array filter, that returns subarray @(2, 3), which Where-Object interprets
# !! as $true, and therefore *passes the input object (array) through*.
& { Write-Output -NoEnumerate @(1, 2, 3) } | Where-Object { $_ -ge 2 }

So as not to confound user expectations, deviation from this behavior should make the target command require user opt-in, such as via the -NoEnumerate and -AsArray switches some built-in cmdlets (now) offer.

The legacy PowerShell edition, Windows PowerShell, neglects to exhibit this patterns in a few cases (i.e. outputs arrays-as-a-whole by default or invariably), which have since been corrected in PowerShell 7.

A prominent example is ConvertFrom-Json, which only in PowerShell 7 exhibits the expected behavior - see https://github.com/PowerShell/PowerShell/issues/3424 for the backstory.

Note that while PowerShell 7's built-in cmdlets now work consistently, from what I can tell, third-party code and even modules that ship with Windows may still exhibit the unexpected behavior; e.g., Get-WinUserLanguageList

If you encounter such a command and want to force enumeration, simply enclose it in (...) (which collects all output in memory first) or pipe to Write-Output (which preserves the streaming behavior).

rhubarb-geek-nz commented 2 weeks ago

Another trick to avoid PowerShell's delisting of single elements is to capture the OutVariable itself which will contain all the elements of the output pipeline, so you can see it is the assignment doing collection.

$date = Get-Date -OutVariable datevar
$date.GetType()
$datevar.GetType()
$datevar[0].GetType()

gives

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     DateTime                                 System.ValueType
True     True     ArrayList                                System.Object
True     True     DateTime                                 System.ValueType
mklement0 commented 2 weeks ago

@rhubarb-geek-nz, while that is technically true, I consider this asymmetry between direct variable assignment and -OutVariable a bug, not a feature, as discussed many years ago in the following issue (nowadays, I would use slightly different framing and language, but the gist of the issue still applies):

Consider the following pitfall:

$null = Get-Item -OutVariable v $PROFILE

# !! ->  "The property 'LastWriteTime' cannot be found on this object. Verify that the property exists and can be set."
$v.LastWriteTime = [datetime]::now

Clearly, the intent of Get-Item $PROFILE is to retrieve a single object; yet, $v is now an ArrayList instance, so that $v.LastWriteTime applies member-access enumeration, which is unsupported for setting properties.

rhubarb-geek-nz commented 2 weeks ago

As a general rule, avoid assignment in PowerShell when you are dealing with multiple items. It is problematic managing code paths which sometimes retun a single item or a list of items. Compare with an SQL query, a result set can return zero items, one item or multiple items with no drama. Where as powershell can give you a null, a single item or a collection.

It is all water under the bridge, but my recommendation remains, avoid assignment operator when dealing with multiple items where the count may be 0, 1 or many. Use the output either in a pipeline or the OutVariable for consistent results.

rhubarb-geek-nz commented 2 weeks ago

Clearly, the intent of Get-Item $PROFILE is to retrieve a single object

However

PS> get-command get-item -syntax

Get-Item [-Path] <string[]> [-Filter <string>] [-Include <string[]>] [-Exclude <string[]>] [-Force] [-Credential <pscredential>] [-Stream <string[]>] [<CommonParameters>]

Get-Item -LiteralPath <string[]> [-Filter <string>] [-Include <string[]>] [-Exclude <string[]>] [-Force] [-Credential <pscredential>] [-Stream <string[]>] [<CommonParameters>]

Your parameter goes to the path variable which can both take an array and expand the wildcards

PS> $FOO='*.ps1'
PS> get-item $FOO

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          29/04/2024    00:20            156 array.ps1
-a---          28/04/2024    20:31            120 empty.ps1
-a---          28/04/2024    21:57             91 error.ps1
-a---          28/04/2024    21:55            101 outvar.ps1

So your clearly is clearly not quite as clear as you suggest.

mklement0 commented 2 weeks ago

The "clearly" applied to the specific command, where a literal, single path was provided as the input.

The point is that any cmdlet is free to situationally "return" - i.e., emit to the success output stream - zero, one, or more output objects.

Earlier we've discussed the stream collection behavior that applies in direct variable. assignment, notably that a single output object is collected as-is.

The point of my previous comment was:

Also, note that your framing wasn't correct:

avoid PowerShell's delisting of single elements is to capture the OutVariable itself

-OutVariable has no impact on auto-enumeration, which happens regardless, unless explicitly suppressed. It's simply that the-OutVariable feature unconditionally creates an ArrayList for the collected output, irrespective of the number of output objects. (Case in point: if you use New-Object System.Collections.ArrayList in combination with -OutVariable, you get a nested single-element ArrayList instance, whose first and only element contains the empty instance created by New-Object).

In concrete terms:

While you may choose to rely on this awkward inconsistency (which the documentation only hints at, without spelling out the ramifications) in order to always get an array-like result, I personally recommend avoiding it, both for the awkwardness of then having to suppress the success output ($null = ... -OutVariable) and the confusing discrepancy.

The short of it:

rhubarb-geek-nz commented 2 weeks ago

A common pattern that I have, which is why I have so much frustration with PowerShell's Schroedinger's OO model is that being able to round trip JSON data is of vital importance. If the original JSON was an array it needs to stays as an array even if it only has one contained object. Likewise if an object contains a property that was array of one object that needs to stay as an array after our processing. Yes ConvertFrom-JSON now has the -NoEnumerate, and that adds to the complexity when writing scripts that have to work on both Desktop and Core. In order to do that we have to have test cases where every array anywhere within an object can have zero, one or some items so we know we are using the right flags at each processing step and work with all combinations of data.

mklement0 commented 2 weeks ago

ConvertFrom-JSON now has -NoEnumerate, and that adds to the complexity when writing scripts that have to work on both Desktop and Core.

That is unfortunate, but an unavoidable consequence of things getting improved / fixed in PS Core.

An array stored in a property value should never pose any problem, however; e.g., the following round-trips properly, in both editions:

[pscustomobject] @{ ArrayProp = @(1) } | ConvertTo-Json | ConvertFrom-Json

Also note that a simple way to avoid auto-enumeration is to pass an enumerable as an argument to ConvertTo-Json:

ConvertTo-Json @(1) -Compress # -> '[1]'
microsoft-github-policy-service[bot] commented 2 weeks ago

This issue has been marked as answered and has not had any activity for 1 day. It has been closed for housekeeping purposes.

microsoft-github-policy-service[bot] commented 2 weeks ago

📣 Hey @abgox, how did we do? We would love to hear your feedback with the link below! 🗣️

🔗 https://aka.ms/PSRepoFeedback

Microsoft Forms