Integers passed to a method expected an array implicitly allocate an array of that length

colejohnson66 commented 2 weeks ago

Prerequisites

[X] Write a descriptive title.
[X] Make sure you are able to repro it on the latest released version
[X] Search the existing issues.
[X] Refer to the FAQ.
[X] Refer to Differences between Windows PowerShell 5.1 and PowerShell.

Steps to reproduce

If one invokes a method that takes an array, passing a numeric type should fail. Instead, an array of that length is implicitly allocated and passed in.

For this issue, Convert.ToHexString has three overloads: byte[], byte[] plus offset+length, and ReadOnlySpan<byte>. When called with a numeric type (such as a character), PowerShell implicitly casts the numeric value to an array instead of failing.

See: dotnet/runtime#101757

Expected behavior

PS> [Convert]::ToHexString([char]0x200F)
InvalidArgument: Cannot convert value ... to type "System.Byte[]".

Actual behavior

PS> [Convert]::ToHexString([char]0x200F)
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...

Specifically, 16414 zeros (or 0x200F*2) are printed.

Environment data

PS> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      7.4.2
PSEdition                      Core
GitCommitId                    7.4.2
OS                             Microsoft Windows 10.0.22631
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

237dmitry commented 2 weeks ago

The same result:

-join ([byte[]][char]0x200f)

mklement0 commented 2 weeks ago

PowerShell allows passing [char] to [byte]-typed parameters, and therefore also to [byte[]]-typed parameters.

In the case of [byte[]], the unexpected array-allocation behavior occurs, namely if the code point of the character is > 0xFF (255), i.e. doesn't fit into a single [byte].

The bug also surfaces with PowerShell-native commands - even with casts, as @237dmitry as discovered - so the simplest repro with a parameter is:

# OK -> 1, 255
# That is: One byte with value 255
& { param([byte[]] $bytes) $bytes.Length; Write-Host $bytes  } ([char] 255)

# !! BROKEN, due to the character's code point being > 255 (0xFF) 
# !! -> 256(!) bytes with value 0
# !! That is, values >= 256 are mistakenly interpreted as the size of an array of 0 bytes to allocated.
& { param([byte[]] $bytes) $bytes.Length; Write-Host $bytes  } ([char] 256)

By contrast, note that the behavior is correct in the following cases, resulting in a statement-terminating error with message "Value was either too large or too small for an unsigned byte" if [char] code point or number > 255 is passed:

with [char] for [byte]-typed parameters (non-arrays), and if you pass an array of [char] values to a [byte[]]-typed parameter
with any instance of a numeric type, both in the [byte] and [byte[]] cases

jhoneill commented 2 weeks ago

The same result:
-join ([byte[]][char]0x200f)

Well that's plain odd. Granted most assumptions about chars converting to 7 or 8 bits have been broken since unicode arrived, but even so it's madly inconsistent.

> ([char]8364)
€

>$x = ([char]8364)

> [int[]]$x
8364

> ([byte[]]$x).count   #  it's that number of zeros
8364

> ([byte[]]'€').count  # String literal fails 
InvalidArgument: Cannot convert value "€" to type "System.Byte[]".

([byte[]][char]'€').count  # String cast to a char (same as $x) gives that number of zeros
8364

> ( [byte[]]::new(8364) ).count   # which is a bit like new() for a an array of byte
8364

( [byte[]]::new([char]'€') ).Count  # and works with a char 
8364

> ( [byte[]]::new('€') ).Count   # but not with a string literal 
MethodException: Cannot convert argument "0", with value: "€", for ".ctor" to type "System.Int32":

 ([byte[]][char[]]'€')    # Single char gives that number of zeros, but array fails 
InvalidArgument: Cannot convert value "€" to type "System.Byte".

> ([byte[]]8364)    # and single int fails 
InvalidArgument: Cannot convert value "8364" to type "System.Byte[]".

SeeminglyScience commented 2 weeks ago

One of the conversion paths PowerShell utilizes is trying to pass the conversion subject as the first argument to a constructor of the target type.

So while it's definitely surprising, and one of the more wacky examples, it does fit within PowerShell's conversion logic. It ends up generating something sorta close to this:

[Convert]::HexString([byte[]]::new([int][char]0x200F))

mklement0 commented 2 weeks ago

Interesting, so[char] is allowed to bind to any numeric parameter, not just [byte].

I suspect the ship has sailed, but:

In general:

While supporting implicit [char]-to-[byte] conversion is understandable, I wonder what the benefit is of allowing something like the following, especially given that use of [char] requires a deliberate act in PowerShell:

# -> 8364
& { param([double] $foo) $foo } ([char] 0x20ac)

As for the specific case at hand:

You could argue that an array-typed parameter should never attempt a constructor call like that. In fact, it is sensibly not attempted when you pass a number directly:

# -> "Cannot convert value "8364" to type "System.Byte".
#      Error: "Value was either too large or too small for an unsigned byte.
& { param([byte[]] $foo) } 0x20ac

In other words: an array-typed parameter that isn't given an argument of the exact type should only ever:

Try to convert a scalar value to the target type and then create a single-element array from it.
Try to convert a differently typed array element-by-element to create an array of the target type.

An inability to convert the argument / at least one argument element should result in a (statement-terminating) error.

SeeminglyScience commented 2 weeks ago

Interesting, so[char] is allowed to bind to any numeric parameter, not just [byte].

It's more just that char > int is a valid conversion path. Just like how string can also convert to int, so this does the same:

[byte[]]'10000'

But direct answer, the conversion path for char > int is a very common way of getting the numeric value for a character. Might actually be my most used conversion interactively. Double is less clear of a case though for sure.

mklement0 commented 2 weeks ago

char > int is a very common way of getting the numeric value for a character.

Good point.

However, I think the point about the inappropriate array-constructor call in the case at hand is still worth addressing. There's no benefit to the current behavior, and it is only likely to cause confusion.

SeeminglyScience commented 2 weeks ago

However, I think the point about the inappropriate array-constructor call in the case at hand is still worth addressing. There's no benefit to the current behavior, and it is only likely to cause confusion.

So to be clear it's a generic conversion path, not just for array types. For instance [IO.FileInfo]'some path' only works because of the "try passing it as an arg to a constructor" conversion path.

I'll definitely grant you that in a vacuum there's no benefit to the specifics of how that conversion plays out, but that's an inevitability with how permissive the system is.

mklement0 commented 2 weeks ago

Understood about the generic conversion path, @SeeminglyScience, but I'm still baffled by the following discrepancy:

# Sensibly FAILS - 0x20ac is only interpreted as [byte] - no fallback to constructor.
[byte[]] 0x20ac

# !! Unexpectedly equivalent to [byte[]]::new(0x20ac)
[byte[]] [char] 0x20ac

Note that the latter seemingly works only with [char], whereas any numeric cast - e.g. [byte[]] [long] 0x20ac - sensibly fails too.

SeeminglyScience commented 2 weeks ago

Ahh okay so there's an existing exclusion in that rule for numeric types that gets circumvented when a non-numeric type could be cast as one. That's sort of the issue with hard coding exclusions (or adding to existing exclusions) the rules get really hard to reason about.

I'll open it up to engine to discuss, but either way I think the implementation would make it challenge without just hard coding a list of every type that is convertible to int.

SeeminglyScience commented 5 days ago

The Engine WG discussed this and agree that the behavior is confusing, but decided that adding additional hard coded exclusions here is not worth the extra potential confusion and/or breaking changes.

microsoft-github-policy-service[bot] commented 4 days ago

This issue has been marked as won't fix and has not had any activity for 1 day. It has been closed for housekeeping purposes.

microsoft-github-policy-service[bot] commented 4 days ago

📣 Hey @colejohnson66, how did we do? We would love to hear your feedback with the link below! 🗣️

🔗 https://aka.ms/PSRepoFeedback

Microsoft Forms

PowerShell / PowerShell