Closed lbruun closed 4 years ago
@lbruun Thanks for the feedback. We will get the documentation updated. Please file a feature request in the source code repository at: https://github.com/PowerShell/PowerShell/issues/new/choose
@lbruun I did some research into these cmdlets and the ZIP specification. The cmdlets are using the .NET ZipArchive class. So any change would have to happen there.
Compress-Archive
stores the file names using UTF-8 encoding. Extract-Archive
extracts the file with the proper character. 7zip stores the file name using Code Page 437 encoding, which encodes the "ü" character as 0x81. Extract-Archive
extracts just writes the value stored. The problem is that there is no official standard for encoding characters in filenames. See Section D.4 in https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT.
Thanks @sdwheeler for very good comments. Appreciate the reaction time here.
Yes, I (now) understand the shortcomings of the ZIP spec.
Yes, the Compress-Archive
is consistent with Extract-Archive
as they use both UTF-8. So Extract-Archive
can predictably unpack what was created with Compress-Archive
. Check!
However, the point here is that Extract-Archive
cannot predictably unzip a file created with native Windows (File Explorer "Compressed Folders" feature) and that is of course not what a user would expect. It should be documented.
I've opened Feature Request 11901 on the matter. This is for what I believe would be the most natural way to allow users be able to use the Expand-Archive
on any zip file, no matter where it was created.
Btw: I think you are slightly wrong when you say that 7-Zip encodes file names as Code Page 437. More accurately it encodes file names using the host's OEM Code Page .. which may or may not be 437. On my system it is Code Page 850. The Windows Compressed Folders feature, as far as I can tell, does the same. It is really not 437 which is at play.
Therefore, my workaround at the moment is to use:
$enc = [System.Text.Encoding]::GetEncoding((Get-WinSystemLocale).TextInfo.OEMCodePage)
[System.IO.Compression.ZipFile]::ExtractToDirectory("myarchive.zip", ".", $enc)
Because of the shortcomings of the ZIP spec there's no way to tell which file name encoding the archive is using but in my case I know it hasn't been created by PowerShell itself and then I think Get-WinSystemLocale).TextInfo.OEMCodePage
is the best guess, at least much better guess than 437.
@lbruun Yes, we can add a note to the documentation about the behavior of Expand-Archive
. That's a good note about your code page. I said code page 437 because that's what is listed in the Zip APPNOTE.TXT. But it seems to be implementation dependent. It would make sense to me to use the hosts OEMCodePage.
Please document how this cmdlet handles a ZIP produced with Windows Explorer "Compressed Folders" feature, where the entries in the ZIP contains non-ASCII-127 characters, for example if an entry is named "Plankalkül.dat". The cmdlet cannot correctly expand such an archive. Bottom line, the Expand-Archive cmdlet doesn't seem to be compatible with the majority of ZIP implementations out there in this respect, incl "Compressed Folders", 7-ZIP file manager, etc. Perhaps an additional switch on the cmdlet would solve the problem?
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.