Closed Hashbrown777 closed 2 days ago
$_.Name.Substring($_.BaseName.length)
is a succinct way to reliably get the accurate Extension
Where is the rule defined that a directory cannot have a file extension?
A directory is still a file, and on POSIX systems the file system does not know about extensions at all, a file extension is purely down to an application's interpretation. The Windows file system is aware of file extensions to a point, since 8.3 names but with NTFS and ability to have multiple periods in a name it is not really the role of the file system to make any inferences.
On a Linux system you will find many directories ending with ".d" under /etc and I would consider that a file extension.
So extension is really in the eye of the beholder.
PS> get-childitem .
Directory: /home/bythesea
UnixMode User Group LastWriteTime Size Name
-------- ---- ----- ------------- ---- ----
drwxr-xr-x bythesea users 04/30/2024 06:17 4096 foo.bar
PS> get-childItem . | Select-Object -Property Extension,BaseName
Extension BaseName
--------- --------
.bar foo.bar
On a Linux system you will find many directories ending with ".d" under /etc and I would consider that a file extension.
Yes, and in those cases the basename would exclude the ".d"; you cannot have it both ways. Also, having ".steve 123_456[yoyo]" as an extension is a far cry from ".d", but instead of asking for a reform of how powershell interprets extensions, I just want it to be consistent; if it says directories basenames are always the same as the name, then the extensions must always be empty.
Yes, name != basename + extension
PS /etc> get-childitem rc* | Select-Object Name,BaseName,Extension
Name BaseName Extension
---- -------- ---------
rc0.d rc0.d .d
rc1.d rc1.d .d
rc2.d rc2.d .d
rc3.d rc3.d .d
rc4.d rc4.d .d
rc5.d rc5.d .d
rc6.d rc6.d .d
rcS.d rcS.d .d
rc.local rc .local
Yes, name != basename + extension
And where is that defined? (genuinely, it's not there) I'm pretty sure in every language basename is defined in terms of full path sans the parent directories and any extension. https://linux.die.net/man/1/basename https://www.php.net/manual/en/function.basename.php
Posting more examples of what I'm describing as a bug isn't helping prove anything?
Most languages either A dont provide a basename, allowing the application to be the beholder as it were, or B allow said application to provide the suffix explicitly. Powershell has taken it upon itself to interpret the extension, and if the programmer decides to use this value, the API needs to be self-consistent.
Yes, name != basename + extension
And where is that defined?
Sorry, I was just confirming what I was seeing, that the basename still had the ,d on the end
I was just confirming what I was seeing
Ah, I thought that was an offered explanation; "Yes, [because] name isnt.." not "Yes, [I see in pwsh] name isnt.."
But seriously, it's bizarre I can't find doco on .Basename
..
This occurs on both windows and linux, starting at v5 all the way through to now.
This may make it hard to make any change other than document the behaviour. We don't have any idea how much existing code is relying on the existing behaviour.
existing code is relying on the existing behaviour.
Dont let pwsh be another cmd.exe, we'll see what the devs say. No-one would be relying on this, they'd be compensating for it. That's kind of why we have a v7 in the first place, people who write code stuck in time stay on 5
$ gi ./foo.bar/ | ft Name, BaseName, Extension
Name BaseName Extension
---- -------- ---------
foo.bar foo.bar
$ gi ./foo.bar | ft Name, BaseName, Extension
Name BaseName Extension
---- -------- ---------
foo.bar foo.bar .bar
That's an interesting observation, and unfortunately not leverageable as a workaround (like for the example screenshot) :(
$ gi ./foo.bar/ | ft Name, BaseName, Extension
That's an interesting observation, and unfortunately not leverageable as a workaround
The Microsoft build tools keep the trailing slash on directory names, so you don't need to append it when constructing full paths, eg
<FilesToDelete Include="$(PublishDir)$(AssemblyName).deps.json" />
<FilesToDelete Include="$(PublishDir)$(AssemblyName).pdb" />
It has a couple advantages
However you do need to look for both '/' and '\'
I'm unsure how that's relevant to an example that just wants to treat directories and files the same and have the output be predictable, nothing's trying to append paths or fetching items directly using known paths.
For instance, trying to "clone" a directory by making symlinks whilst injecting into the new names:
gci 'version1/' | %{ ln -s $_ "export/$($_.BaseName).version1$($_.Extension)" }
Knowing gi $directPathWithSlash
changes $_.Extension
wont help because "$" is a result of a search, not picking up a specific path. Using `$.Name.Substring($.BaseName.length)in place of
$.Extension` does function though.
But seriously, it's bizarre I can't find doco on .Basename..
BaseName
is not a property on the DirectoryInfo/FileInfo types in .NET but part of an ETS member added by PowerShell. You can see that on PowerShell 7 it's a ScriptProperty
that is simply an alias for Name
for DirectoryInfo
and a more complex script property for FileInfo
PS /home/jborean> Get-Item $pwd | Get-Member -name BaseName
TypeName: System.IO.DirectoryInfo
Name MemberType Definition
---- ---------- ----------
BaseName ScriptProperty System.Object BaseName {get=$this.Name;}
PS /home/jborean> Get-Item $PSHome/pwsh | Get-Member -Name BaseName
TypeName: System.IO.FileInfo
Name MemberType Definition
---- ---------- ----------
BaseName ScriptProperty System.Object BaseName {get=if ($this.Extension.Length -gt 0){$this.Name.Remove($this.Name.Length - $this.Exte…
You can also use Get-TypeData
to see that the BaseName
property is set on the type and not just manually added to the instance by Get-Item
PS /home/jborean> (Get-TypeData System.IO.FileInfo).Members.BaseName
GetScriptBlock SetScriptBlock IsHidde
n
-------------- -------------- -------
if ($this.Extension.Length -gt 0){$this.Name.Remove($this.Name.Length - $this.Extension.Length)}else{$this.Name} False
PS /home/jborean> (Get-TypeData System.IO.DirectoryInfo).Members.BaseName
GetScriptBlock SetScriptBlock IsHidden Name
-------------- -------------- -------- ----
$this.Name False BaseName
Let me attempt a summary:
There is an intentional, but baffling distinction between file (System.IO.FileInfo
) and directories (System.IO.DirectoryInfo
) built into PowerShell's .BaseName
ETS property: for directories, .BaseName
is the same as .Name
, i.e. includes any potential name extension.
.BaseName
in this case arguably never made sense:
-not $_.Parent.BaseName.EndsWith(".Autorest", "CurrentCultureIgnoreCase")
; all other uses seem to relate to files.The .NET type-native .Extension
property (sensibly) does not make this distinction.
\
or /
, as discovered by @237dmitry, is arguably a .NET bug that PowerShell merely surfaces.To add to @jborean93's comment re discovery of ETS members: Get-Member -View Extended
shows all ETS members associated with a given instance, both instance-level ETS members (created ad hoc) and type-level ones (created via .types.ps1xml
files or calls to Update-TypeData
) (you won't be able to tell from Get-Member
's output whether the members are instance- or type-level).
Do you know why non-alphanumeric characters, such as spaces and brackets, are permitted to form the .Extension
?
This wont help my issue, but it seems equally baffling to me.
Do you know why non-alphanumeric characters, such as spaces and brackets, are permitted to form the
.Extension
?
I suggest that the definition of file extension is really simple and is just what follows the period in a file name. Even that definition is ambiguous if there are multiple periods. Given there are no restrictions on what may be part of the stem, likewise there does not need to be any restriction on the extension.
A common example in the Microsoft world is using tildes at the end of filenames that are temporary.
I think that's absurd, though. Consider Mr. Rhubarb.docx
and a 'valid' extension being . Rhubarb.docx
or basically any hidden, extensionless file on linux literally not even having a name at all and the whole filename is the extension (eg .bashrc
, whereas bashrc is the name and it has no extension, it's just hidden).
[regex]'(?<=.)(\.[a-zA-Z0-9_]+|)~?$'
I think is what I have in my head (psuedocode..I mean unicode will kill that if there are any extensions out there), but I'm wondering whether [\s()[\]]
and other special characters are being used anywhere.
The concept of valid extension does not exist. There are valid characters in a filename, and the concept of what ever follows the [last] period that that is it. Then also the historical definition of an 8.3 filename.
Sure there are common extensions, and there are extension mappings listed in the registry. Applications can register what they want.
Have a look at
Get-ChildItem HKCU:Software\Classes | Select-Object PSChildName
The general idea behind extensions is it helps you know how to handle files, whether you can or not. If you don't recognize an extension that is absolutely fine, it means you don't know how to handle the file.
In UNIX case is also important to certain applications. For instance C++ compilers treat lower-case "'.c" as a C file and ".C" as a C++ file. But that interpretation is down to the applications, there is no governing body allocating valid file extensions or how to interpret them.
Historically Apple, ( of course Apple) had a TYPE/CREATOR registry. The original Macintosh had no concept of file extensions and the type of file was held in the directory entry for the file. Eg TEXT was a text file, PICT was an image, APPL was an application program etc. The equivalent of the Windows extensions mapping was why the Finder was called the Finder. It found the appropriate application for a file based on TYPE and CREATOR. You were supposed to apply to Apple for approval and to register your type and creator.
Go to a Windows command prompt and type
DIR *.*
then do the same in PowerShell
In the original command prompt, . will list all files, whether they had a period in the name or not. Because that is how it worked on CP/M.
@rhubarb-geek-nz that has nothing to do with extensions...
Sorry, I am lost now. I don't know what you are wanting to achieve. If you are wanting to find the last period in a name then all you need is System.String.LastIndexOf rather than a regular expression,
but I'm wondering whether [\s()[]] and other special characters are being used anywhere
I don't think mentioning how extensions can be differentiated on letter-case or recognised at all is helpful in this context because those are already catered for in "my expected extension"™ (where we accept those characters and don't care how they're used).
The concept of valid extension does not exist.
I'm saying I dont think there's ever been a usecase for wanting spaces, periods, and brackets in the extension, and would like to know if there exists preceident for this.
cmd equating (?<!^)\.\*
to (\..\*)?
is interesting, considering it does match a file called abcd. def()
using dir abcd.*
the concept of what ever follows the [last] period that that is it
I mean to keep this in the realm of extensions, there are .tar.gz
and .rar.01
et cetera, but it's arguable that that is a usecase handled by the application to not merely recognise those, but to interpret the raw names themselves, and it's not expected that regular API users would want them lumped together. I view spaces and such in the same light.
System.String.LastIndexOf rather than a regular expression,
The regular expression handles this fine, but I'm just using it as a way to communicate rather than listing conditions in english, which would be cumbersome, implementation isn't important.
I'm saying I dont think there's ever been a usecase for wanting spaces, periods, and brackets in the extension, and would like to know if there exists preceident for this.
Think mechanism not policy. The definition of a file extension as everything after the [last] period has worked for around 50 years. If you want to do something more esoteric, then absolutely fine, but put that in different piece of code. Leave the existing mechanism that works as it is.
The .NET implementation of the .Extension
property is indeed very simple:
.
in a name is considered the extension, whatever characters (by definition other than .
) follow it.Examples:
([System.IO.FileInfo[]] ('foo', 'foo.bar', 'foo. bar.docx', 'foo. bar', 'foo. ', 'foo.')).Extension |
% { "[$_]" }
Output:
[]
[.bar]
[.docx]
[. bar]
[. ] # on Unix only: on Windows: []
[.] # on Unix only: on Windows: []
Note that the platform differences with respect to 'foo.'
and 'foo. '
: on Windows, to avoid creating invalid filenames, the latter names are reflected as just ...\foo
in the .FullName
property, which the .Extension
property operates on (though, curiously, the .Name
property reflects the name as given).
.foo
is "all-extension", and in PowerShell translates to an empty .BaseName
# -> '[][.foo]'
[System.IO.FileInfo[]] '.foo'| % { '[{0}][{1}]' -f $_.BaseName, $_.Extension }
Note that the problem of an empty base name doesn't arise in .NET, as .BaseName
is purely a PowerShell (ETS) property.
As for cmd.exe
's dir *.*
behavior:
While PowerShell's own wildcard patterns indeed only return items whose name contains at least one .
(which applies to the -Path
, and -Include
/ -Exclude
parameters), the -Filter
parameter uses the legacy / system-native wildcard matching; in other words: Get-ChildItem -Filter *.*
exhibits the same matching behavior as cmd /c dir *.*
That actually answered a different question I had taboot; 'how can I pass a prospective path to the FS and get it validated/corrected without just trying it and catching an exception?'. I'll have a look at FileInfo
casting
@Hashbrown777, note that a pitfall with casting (which simply translates into a constructor call behind the scenes) is that relative paths are then resolved against the process working directory, which usually differs from PowerShell's; that is, a fully robust cast would have to use [System.IO.FileInfo] (Join-Path (Get-Location -PSProvider FileSystem).ProviderPath 'foo.txt')
in order to be correctly resolved against PowerShell's current file-system provider location; if you're willing to assume that PowerShell's current location is a file-system location and that location isn't based on a PowerShell-only drive, [System.IO.FileInfo] "$PWD/foo.txt"
will do.
I'll have a look at
FileInfo
casting
Fortunately the rules of filenames are very simple.
On Windows
PS> [System.IO.Path]::PathSeparator
;
PS> [System.IO.Path]::DirectorySeparatorChar
\
PS> [System.IO.Path]::GetInvalidFileNameChars() | Where-Object { $_ -gt 32 }
"
<
>
|
:
*
?
\
/
And on UNIX
PS> [System.IO.Path]::PathSeparator
:
PS> [System.IO.Path]::DirectorySeparatorChar
/
PS> [System.IO.Path]::GetInvalidFileNameChars() | Where-Object { $_ -gt 32 }
/
And to avoid the mentioned scenario of trailing spaces use System.String.Trim()
Notice the path separator is not an invalid filename character.
But only the file system can tell you if a particular volume/drive/directory is case sensitive or not.
Seeing the code of a Chinese PowerShell project was an eye-opener, where not only the comments were in Chinese, but so were the file names and even function names, and it worked.
For mere formal path validation, there's also Test-Path -IsValid
, but there are two caveats:
It is currently broken on Windows, but will be fixed - see #21112
If the path uses a non-existent drive spec, it must be prefixed with FileSystem::
in order to ensure interpretation as a file-system path; e.g., if drive q:
doesn't exist, Test-Path -IsValid q:/foo
is $false
(even on Unix), even though it is a formally valid path; use Test-Path -IsValid FileSystem::q:/foo
Finally, note that Convert-Path
can be used to convert a path based on a PowerShell-only drive to the underlying, native file-system path; e.g.: Convert-Path Temp:\
I note that on Linux
Get-ChildItem . -Filter '*.*'
implements the Windows file system filtering convention, which is different from, say, ls *.*
in bash
My theory is that on Windows it is implemented by FindFirstFileW so the operating system does the filtering, but POSIX opendir
/readdir
/closedir
don't do any filtering so it is implemented in .NET.
[System.IO.Directory]::GetFiles('.','*.*')
this includes files without the period
Good point, @rhubarb-geek-nz - I had wrongly assumed that a platform-native system call would be used on Unix-like platforms.
Yes, PowerShell defers to .NET (FindFirstFileW
is only used directly by PowerShell in the context of examining reparse points) and PowerShell explicitly requests the Win32 behavior with its legacy quirks even on Unix-like platforms (which I presume .NET offers as self-implemented emulation):
The .NET APIs themselves default to the Windows behavior, albeit inconsistently; see:
Specifically, the .MatchType
property of System.IO.EnumerationOptions
defaults to System.IO.MatchType.Simple
(no legacy quirks; *.*
only matches names that contain .
), but the .Enumerate*()
file/directory-enumeration methods that take no System.IO.EnumerationOptions
argument default to System.IO.MatchType.Win32
It seems that the original issue is that DirectoryInfo
has Extension
property populated, but this is from .NET Runtime, so if that gets addressed, then PS would reflect that change.
It seems that the original issue is that
DirectoryInfo
hasExtension
property populated
That's not an issue: it is by - to me sensible - design, and I don't think it will change, nor - in my estimation - should it.
As such, I think the Resolution-External
label is inappropriate.
The real issue is the - to me dubious - PowerShell behavior of selectively ignoring the name extension in directory names in the - PowerShell-only - BaseName
property.
Not only does it introduce the asymmetry between PowerShell and .NET discussed in the initial post, I'm not aware of an intrinsic justification for it.
Based on the analysis above, I'd say that fixing the conceptually flawed PowerShell behavior is still a (desirable) option.
The real issue is the - to me dubious - PowerShell behavior of selectively ignoring the name extension in directory names in the - PowerShell-only -
BaseName
property.
- Not only does it introduce the asymmetry between PowerShell and .NET discussed in the initial post, I'm not aware of an intrinsic justification for it.
- Based on the analysis above, I'd say that fixing the conceptually flawed PowerShell behavior is still a (desirable) option.
Thanks for calling that out. I would agree that it is inconsistent and a question of whether it's really a bucket 3 or not (and I do see you've done some initial research on this, thanks!). I've updated the title of this issue to reflect the core problem. Will tag for WG to discuss.
WG discussed this. Although we agree that the design is not ideal, it was intentional when it was written and there is likely customers depending on this behavior. If we look at how unix systems define basename
, then the error in behavior is for FileInfo which should include the extension as that is part of the filename and instead there should have been something like basenamewithoutextension
. As such, we accept that this is by-design and would recommend a doc bug to clarify the difference in behavior for FileInfo vs DirectoryInfo.
@SteveL-MSFT, while I can appreciated the concern about breaking things, note that the basename
Unix utility is not relevant to this discussion, because it is the equivalent of Split-Path -Leaf
and therefore unrelated to extensions.
Just to clarify (which may help with documenting):
I don't think there's an error in FileInfo
: its .Extension
logic is very simple and easy to conceptualize: everything starting with the last .
in a file or directory name is the extension, which means that a name such as .bashrc
is "all extension" (but that case is easy to detect by testing .Name
and .Extension
for equality).
Given that .NET has no .BaseName
property to complement .Extension
, there is no inconsistency there.
The problematic inconsistency was introduced by PowerShell, when it choose to treat file and directory names differently, so that .BaseName + .Extension
equaling .Name
doesn't hold for directory names.
This issue has been marked as by-design and has not had any activity for 1 day. It has been closed for housekeeping purposes.
📣 Hey @Hashbrown777, how did we do? We would love to hear your feedback with the link below! 🗣️
🔗 https://aka.ms/PSRepoFeedback
Prerequisites
Steps to reproduce
This occurs on both windows and linux, starting at v5 all the way through to now.
<#System.IO.FileInfo#> | %{ $_.BaseName + $_.Extension }
should always be equivalent to<#System.IO.FileInfo#>.Name
Forgci -File
this holds true. Forgci -Directory
although pwsh correctly has.BaseName
always match.Name
(folders cannot have extensions..),.Extension
incorrectly matches.Name -replace '^.*?(?=\.[^.]*$|$)',''
instead of always returning""
Expected behavior
Actual behavior
Error details
No response
Environment data
Visuals