Closed abgox closed 2 weeks ago
function test-function {
return New-Object System.Collections.ArrayList
}
$arr = test-function
$arr.GetType()
gives
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True ArrayList System.Object
@()
be a problem?
- But, why
@()
be a problem?
You have my sympathy, PowerShell can be infuriating with lists of zero or one items where the list can magically evaporate.
My best explanation is PowerShell always tries to simplify, often turning lists of one item into just one item. This can be problematic in that you can't just write code dealing with lists, you have to test for the cases of none, one and some. Also in key places classic PowerShell differs from the behaviour of PowerShell core. Hence parameters like -AsArray, -NoEnumerate etc trying to undo what PowerShell insists on doing.
There is no magic, only an initially perhaps surprising behavior that is fundamental to PowerShell:
In the PowerShell pipeline - which is invariably involved when producing output from a command (function, script, cmdlet, script block) - (most) enumerables (such as arrays, ArrayLists, ...) are auto-enumerated; that is, an enumerable's elements are sent one by one to the success output stream.
In other words: Unless you take extra steps (see below), the original enumerable is - predictably - lost, and in the success output stream you cannot tell the difference between outputting a single-element enumerable and the one element it contains, given that in both cases it is only the latter that is sent to the output stream.
An empty enumerable - such as in your case - sends "nothing" to the success output stream (pipeline), which is technically the [System.Management.Automation.Internal.AutomationNull]::Value
singleton, which in effect behaves like $null
in expression contexts and argument-based parameter binding (such as your case).
The success output stream is an open-ended stream of objects that itself has no notion of an array or a similar data structure: the objects in it can - and often are - processed one by one, as they are received, in which case the question of how to collect them for later processing doesn't arise.
Collecting stream output of necessity comes into play when you assign to a variable (e.g., $arr = test-function
), or make command output participate in a larger expression (e.g., 'foo' + (test-function)
), including use of $(...)
and @(...)
(except with array literals). Collecting a single object in the stream causes it to be collect as-is. It is only if two or more objects in the stream that a list-like data type is invariably needed for collection, in which case PowerShell invariably creates an [object[]]
-typed array. For the reasons explained above, this array is unrelated to any originating enumerable type, which never participated as itself in the pipeline.
To send an instance of an enumerable type itself, as a whole to the success output stream, you must prevent auto-enumeration:
New-Object
itself uses this technique, which is why @rhubarb-geek-nz's workaround is effective; New-Object
's behavior is unusual among cmdlets (see below), but necessary in order to preserve the constructed instance as-is.
New-Object object[] 0
sends the resulting array as itself to the success output stream, the otherwise equivalent expression [object[]]::new(0)
is subject to auto-enumerationWrite-Output -NoEnumerate
,
but an often-seen shortcut is to use the unary form of ,
the array-constructor operator to create a transient helper array that wraps the output enumerable in a single-element array whose auto-enumeration then sends the enumerable itself to the success output stream.In other words: The following techniques all work to output an empty array as a whole from your function:
# Using New-Object
function test-function { New-Object -object[] 0 }
# Using Write-Output -NoEnumerate
function test-function { Write-Output -NoEnumerate @() }
# Using a transitory single-element helper array wrapper
function test-function { , @() }
It is worth noting that auto-enumeration is a core PowerShell feature that you should generally not deviate from, especially in public-facing functions / cmdlets / scripts.
On a higher level of abstraction, one of PowerShell's core strength is its consistency, of which consistent behavior in output streams / in the pipeline is one aspect.
To put it in concrete terms: Users justifiably expect commands to output objects one by one rather than outputting list-like containers as a whole, especially given that the latter behavior will not behave as expected in the pipeline; e.g.:
# Expected, auto-enumerating streaming behavior (element-by-element streaming).
# Where-Object's script block is invoked once for each element.
# -> 2, 3
& { @(1, 2, 3) } | Where-Object { $_ -ge 2 }
# Unusual, array-as-a-whole output behavior.
# !! -> @(1, 2, 3)
# !! Where-Object only receives *one* input object, which is the *array* as while, in which
# !! case -ge acts as an array filter, that returns subarray @(2, 3), which Where-Object interprets
# !! as $true, and therefore *passes the input object (array) through*.
& { Write-Output -NoEnumerate @(1, 2, 3) } | Where-Object { $_ -ge 2 }
So as not to confound user expectations, deviation from this behavior should make the target command require user opt-in, such as via the -NoEnumerate
and -AsArray
switches some built-in cmdlets (now) offer.
The legacy PowerShell edition, Windows PowerShell, neglects to exhibit this patterns in a few cases (i.e. outputs arrays-as-a-whole by default or invariably), which have since been corrected in PowerShell 7.
A prominent example is ConvertFrom-Json
, which only in PowerShell 7 exhibits the expected behavior - see https://github.com/PowerShell/PowerShell/issues/3424 for the backstory.
Note that while PowerShell 7's built-in cmdlets now work consistently, from what I can tell, third-party code and even modules that ship with Windows may still exhibit the unexpected behavior; e.g., Get-WinUserLanguageList
If you encounter such a command and want to force enumeration, simply enclose it in (...)
(which collects all output in memory first) or pipe to Write-Output
(which preserves the streaming behavior).
Another trick to avoid PowerShell's delisting of single elements is to capture the OutVariable itself which will contain all the elements of the output pipeline, so you can see it is the assignment doing collection.
$date = Get-Date -OutVariable datevar
$date.GetType()
$datevar.GetType()
$datevar[0].GetType()
gives
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True DateTime System.ValueType
True True ArrayList System.Object
True True DateTime System.ValueType
@rhubarb-geek-nz, while that is technically true, I consider this asymmetry between direct variable assignment and -OutVariable
a bug, not a feature, as discussed many years ago in the following issue (nowadays, I would use slightly different framing and language, but the gist of the issue still applies):
Consider the following pitfall:
$null = Get-Item -OutVariable v $PROFILE
# !! -> "The property 'LastWriteTime' cannot be found on this object. Verify that the property exists and can be set."
$v.LastWriteTime = [datetime]::now
Clearly, the intent of Get-Item $PROFILE
is to retrieve a single object; yet, $v
is now an ArrayList
instance, so that $v.LastWriteTime
applies member-access enumeration, which is unsupported for setting properties.
As a general rule, avoid assignment in PowerShell when you are dealing with multiple items. It is problematic managing code paths which sometimes retun a single item or a list of items. Compare with an SQL query, a result set can return zero items, one item or multiple items with no drama. Where as powershell can give you a null, a single item or a collection.
It is all water under the bridge, but my recommendation remains, avoid assignment operator when dealing with multiple items where the count may be 0, 1 or many. Use the output either in a pipeline or the OutVariable for consistent results.
Clearly, the intent of
Get-Item $PROFILE
is to retrieve a single object
However
PS> get-command get-item -syntax
Get-Item [-Path] <string[]> [-Filter <string>] [-Include <string[]>] [-Exclude <string[]>] [-Force] [-Credential <pscredential>] [-Stream <string[]>] [<CommonParameters>]
Get-Item -LiteralPath <string[]> [-Filter <string>] [-Include <string[]>] [-Exclude <string[]>] [-Force] [-Credential <pscredential>] [-Stream <string[]>] [<CommonParameters>]
Your parameter goes to the path variable which can both take an array and expand the wildcards
PS> $FOO='*.ps1'
PS> get-item $FOO
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 29/04/2024 00:20 156 array.ps1
-a--- 28/04/2024 20:31 120 empty.ps1
-a--- 28/04/2024 21:57 91 error.ps1
-a--- 28/04/2024 21:55 101 outvar.ps1
So your clearly is clearly not quite as clear as you suggest.
The "clearly" applied to the specific command, where a literal, single path was provided as the input.
The point is that any cmdlet is free to situationally "return" - i.e., emit to the success output stream - zero, one, or more output objects.
Earlier we've discussed the stream collection behavior that applies in direct variable. assignment, notably that a single output object is collected as-is.
The point of my previous comment was:
There is NO good reason for $v = ...
to collect the output objects differently than ... -OutVariable v
.
This difference can lead to bugs / unexpected behavior that may be hard to understand.
Also, note that your framing wasn't correct:
avoid PowerShell's delisting of single elements is to capture the OutVariable itself
-OutVariable
has no impact on auto-enumeration, which happens regardless, unless explicitly suppressed.
It's simply that the-OutVariable
feature unconditionally creates an ArrayList
for the collected output, irrespective of the number of output objects. (Case in point: if you use New-Object System.Collections.ArrayList
in combination with -OutVariable
, you get a nested single-element ArrayList
instance, whose first and only element contains the empty instance created by New-Object
).
In concrete terms:
[System.Management.Automation.Internal.AutomationNull]::Value
(the "enumerable null", which behaves like $null
in an expression context, and like an empty enumerable in the pipeline), whereas -OutVariable
creates an empty ArrayList
instance.-OutVariable
creates a single-element ArrayList
[object[]]
array, whereas -OutVariable
creates a (multi-element, resizable) ArrayList
instance.While you may choose to rely on this awkward inconsistency (which the documentation only hints at, without spelling out the ramifications) in order to always get an array-like result, I personally recommend avoiding it, both for the awkwardness of then having to suppress the success output ($null = ... -OutVariable
) and the confusing discrepancy.
The short of it:
In order to emit enumerables as a whole from a PowerShell command, auto-enumeration must be suppressed (as an aside: in the Cmdlet.WriteObject()
SDK function, the logic is reversed), using the techniques previously discussed.
If you want to ensure that at most one object is captured in a variable, pipe to Select-Object -First 1
or - if you don't mind collecting all output first - use (...)[0]
(assuming Set-StrictMode
is at most at -Version 2
).
If you want to ensure that output is always captured in an array, use @(...)
, the array-subexpression operator ($v = @(...)
), or (with subtly different behavior, [array] $v = ....
).
However, thanks to PowerShell's unified handling of scalars and lists, provided via intrinsic members for scalars and member-access enumeration for enumerables, it is often not necessary to force creation of an array or, conversely, to explicitly enumerate the elements of an array for member access (read-only property access and method access).
And, yes, if you don't actually need to collect a(n intermediate) command's output, processing it in a streaming fashion in a pipeline is the best approach.
A common pattern that I have, which is why I have so much frustration with PowerShell's Schroedinger's OO model is that being able to round trip JSON data is of vital importance. If the original JSON was an array it needs to stays as an array even if it only has one contained object. Likewise if an object contains a property that was array of one object that needs to stay as an array after our processing. Yes ConvertFrom-JSON now has the -NoEnumerate, and that adds to the complexity when writing scripts that have to work on both Desktop and Core. In order to do that we have to have test cases where every array anywhere within an object can have zero, one or some items so we know we are using the right flags at each processing step and work with all combinations of data.
ConvertFrom-JSON now has -NoEnumerate, and that adds to the complexity when writing scripts that have to work on both Desktop and Core.
That is unfortunate, but an unavoidable consequence of things getting improved / fixed in PS Core.
An array stored in a property value should never pose any problem, however; e.g., the following round-trips properly, in both editions:
[pscustomobject] @{ ArrayProp = @(1) } | ConvertTo-Json | ConvertFrom-Json
Also note that a simple way to avoid auto-enumeration is to pass an enumerable as an argument to ConvertTo-Json
:
ConvertTo-Json @(1) -Compress # -> '[1]'
This issue has been marked as answered and has not had any activity for 1 day. It has been closed for housekeeping purposes.
📣 Hey @abgox, how did we do? We would love to hear your feedback with the link below! 🗣️
🔗 https://aka.ms/PSRepoFeedback
Prerequisites
Steps to reproduce
[array]
cast type, which didn't work either.Compare-Object
error occurs because it does not return the expected array.-is
to see if it was an array, and returned false.Expected behavior
Actual behavior
Error details
No response
Environment data
Visuals