PowerShell / PowerShell

PowerShell for every system!
https://microsoft.com/PowerShell
MIT License
43.55k stars 7.06k forks source link

`.ResolveTarget() | gci -Recurse` buggy when target contains `[]` #21568

Open Hashbrown777 opened 2 weeks ago

Hashbrown777 commented 2 weeks ago

Prerequisites

Steps to reproduce

cat test.ps1

#!/bin/pwsh
Param($dir = 'A [B C-2]', $link = 'B')
mkdir $dir
touch "$dir/bob"
ln -s $dir $link

try {
         gi -LiteralPath $dir                            | %{ "$($_.GetType().Name)`t$($_.FullName)" }
         gi -LiteralPath $dir                            | gci -Recurse

         gi -LiteralPath $link                           | %{ "$($_.GetType().Name)`t$($_.FullName)" }
         gi -LiteralPath $link                           | gci -Recurse

        (gi -LiteralPath $link).ResolveLinkTarget($True) | %{ "$($_.GetType().Name)`t$($_.FullName)" }
        (gi -LiteralPath $link).ResolveLinkTarget($True) | gci

        (gi -LiteralPath $link).ResolveLinkTarget($True) | %{ "$($_.GetType().Name)`t$($_.FullName)" }
        (gi -LiteralPath $link).ResolveLinkTarget($True) | gci -Recurse
}
finally {
        rm -rf $dir $link
}

Expected behavior

PS> ./test.ps1 'A' 'B'
DirectoryInfo   /home/hashbrown/A
{bob}
DirectoryInfo   /home/hashbrown/B
{bob}
DirectoryInfo   /home/hashbrown/A
{bob}
DirectoryInfo   /home/hashbrown/A
{bob}

PS> ./test.ps1 'A [B C-2]' 'B'
DirectoryInfo   /home/hashbrown/A [B C-2]
{bob}
DirectoryInfo   /home/hashbrown/B
{bob}
DirectoryInfo   /home/hashbrown/A [B C-2]
{bob}
DirectoryInfo   /home/hashbrown/A [B C-2]
{bob}

PS> ./test.ps1 'A [C]' 'B'
DirectoryInfo   /home/hashbrown/A [C]
{bob}
DirectoryInfo   /home/hashbrown/B
{bob}
DirectoryInfo   /home/hashbrown/A [C]
{bob}
DirectoryInfo   /home/hashbrown/A [C]
{bob}

Actual behavior

PS> ./test.ps1 'A' 'B'
DirectoryInfo   /home/hashbrown/A
{bob}
DirectoryInfo   /home/hashbrown/B
{bob}
DirectoryInfo   /home/hashbrown/A
{bob}
DirectoryInfo   /home/hashbrown/A
{bob}

PS> ./test.ps1 'A [B C-2]' 'B'
DirectoryInfo   /home/hashbrown/A [B C-2]
{bob}
DirectoryInfo   /home/hashbrown/B
{bob}
DirectoryInfo   /home/hashbrown/A [B C-2]
DirectoryInfo   /home/hashbrown/A [B C-2]
The specified wildcard character pattern is not valid: A [B C-2]

PS> ./test.ps1 'A [C]' 'B'
DirectoryInfo   /home/hashbrown/A [C]
{bob}
DirectoryInfo   /home/hashbrown/B
{bob}
DirectoryInfo   /home/hashbrown/A [C]
DirectoryInfo   /home/hashbrown/A [C]
<#FREEZE#>

Error details

No response

Environment data

Name                           Value
----                           -----
PSVersion                      7.4.2
PSEdition                      Core
GitCommitId                    7.4.2
OS                             Fedora Remix for WSL
Platform                       Unix
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Visuals

No response

mklement0 commented 2 weeks ago

The root cause is that - unfortunately - pipeline input to provider cmdlets such as Get-ChildItem is bound (a) invariably as a string and (b) to the -Path rather than the -LiteralPath parameter.

This means that path strings as input - which includes implicitly stringified objects that do not have a .PSPath or .LiteralPath property - are invariably interpreted as wildcard patterns.

Workarounds:

'hi [123].txt' | Select-Object @{ Name='LiteralPath'; Expression={ $_ } } | Get-ChildItem
'hi [123].txt' | Get-ChildItem -LiteralPath { $_ }

Note that piping Get-Item / Get-ChildItem directly to another such call does bind to -LiteralPath, courtesy of the .PSPath property added to each output item by the provider cmdlets, which binds pipeline input by property name to the -PSPath parameter, which is simply an alias of -LiteralPath.

Hashbrown777 commented 2 weeks ago

Why doesn't the output of ResolveLinkTarget also have PSPath populated given it purports to be an identical DirectoryInfo?

Delay-bind script blocks is something I thought up and wanted but didn't know was already supported by the language itself. I thought only cmdlets that explicitly optionally ask for scriptblocks (and thus handle it themselves) accepted it (like Sort-Objects -Property), neat!

Hashbrown777 commented 2 weeks ago

Are you saying that the earlier cmdlets are looking ahead in the pipeline and modifying their output instead of the later cmdlets accepting and handling different types of inputs? Like gci only accepts strings in, and without gi changing its output, the pipeline is being cast? Why doesnt gci just accept FileSystemInfo as well as strings and handle those cases appropriately on input?

mklement0 commented 2 weeks ago

Why doesn't the output of ResolveLinkTarget also have PSPath populated given it purports to be an identical DirectoryInfo?

Because, unfortunately, it is only the DirectoryInfo and FileInfo instances emitted by provider cmdlets that have a .PSPath property (among other provider-related instance ETS properties).

This problem would go away if these provider-related ETS properties were defined at the type level, which would also speed up things; see the following issue and the one it links to; the discussion around which stalled a long time ago:

mklement0 commented 2 weeks ago

Why doesn't gci just accept FileSystemInfo

That is an excellent question, but you'll have to ask the original designers; I wish it did.

Are you saying that the earlier cmdlets are looking ahead in the pipeline and modifying their output instead of the later cmdlets accepting and handling different types of inputs?

No, no cmdlet does nor is even capable of looking ahead in the pipeline (at least not without substantial manual effort).

Get-ChildItem's pipeline-binding parameters are as follows:

Name        Aliases      ParameterType   PipelineBinding
----        -------      -------------   ---------------
LiteralPath {PSPath, LP} System.String[] By property name
Path        {}           System.String[] By value, By property name
Hashbrown777 commented 2 weeks ago

4347 needs to be done. This revelation has a lot of implications for basically all of my scripts, I thought the whole point of powershell was that using the pipeline was the safest thing to do; you're piping objects, not just strings. Not only converting these objects to strings, but then using those strings as a glob pattern instead of a path was incredibly shortsighted, then needing to work around it using scriptblocks destroys the elegance that pwsh has :(

How many other cmdlet-usecases have caveats akin to this, it would be impossible to know.. I feel like the auto casting in the language itself was a bad idea, and it needed to be obvious to the programmer. Not necessarily mandating manual casting, but maybe when pwsh was starting out it needed a "autocasting permitted" operator.

gci.Resolve() | gci
#Error: Sorry, I only accept strings!

[](gci.Resolve()) | gci
#empty cast declaration, the programmer has opted into auto-casting

gci.Resolve() [|] gci
#some sort of syntax to allow this via the pipeline not only maintaining visual elegance but performance in terms of not blocking the whole generation of the former gci

..alas

mklement0 commented 2 weeks ago

Re potential challenges around Get-ChildItem, ... directly accepting FileInfo / DirectoryInfo input:

Apart from that, I think the existing pipeline-based argument binding is sufficient except that discovery of which parameters are pipeline-binding and, if so, how is called for; see: