Closed WilliamXieMSFT closed 5 years ago
The change is covered in the release notes and there is a detailed article about encoding.
Thanks for the links, Sean! My gripe was that there's both UTF8 and UTF8NoBOM, which feels like UTF8NoBOM is redundant.
UTF8 and UTF8NoBOM are different. UTF8 has a byte-order-mark (BOM) at the beginning of the file. UTF8NoBOM does not. The BOM is not always compatible across applications and platforms.
In fact, UTF8BOM
is not recognized by Out-File
cmdlet. I'm using PowerShell version 5.1.18362.145 and the output is
Out-File : Não é possível validar o argumento no parâmetro 'Encoding'. O argumento "UTF8BOM" não pertence ao conjunto "unknown;string;unicode;bigendianunicode;utf8;utf7;utf32;ascii;default;oem" especificado pelo atributo ValidateSet.
Forneça um argumento que esteja no conjunto e tente o comando novamente.
No linha:1 caractere:20
+ Out-File -Encoding UTF8BOM Teste.txt
+ ~~~~~~~
+ CategoryInfo : InvalidData: (:) [Out-File], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.OutFileCommand
I don't understand why this question was dismissed/closed. I don't see any comments actually explaining/addressing the issue. The documentation was mentioning 3 distinct options (it was for PowerShell 6, I believe), and it still does for PowerShell versions 7.0, 7.1 and 7.2 (emphasis mine):
utf8
: Encodes in UTF-8 format.utf8BOM
: Encodes in UTF-8 format with Byte Order Mark (BOM)utf8NoBOM
: Encodes in UTF-8 format without Byte Order Mark (BOM)
None of those document versions mention, if the first option ("utf8") will encode with or without Byte Order Mark (or if this behavior is dependent on platform or whatever). The documents linked by @sdwheeler describe that the default encoding has changed (to "UTF8NoBOM") and how encoding works in PowerShell in general. None of those two mention if "utf8" encoding in PowerShell uses BOM or not. The second document mentions "utf8" as one without BOM, but in the VSCode context, not PowerShell. This document: https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_character_encoding?view=powershell-7.1#character-encoding-in-windows-powershell Mentions "UTF8" encoding as one with BOM, but it actually states, that this is for PowerShell 5.1, so it fails to list "utf8BOM" or "utf8NoBOM". In that light it's hard to assess if it applies to PowerShell 7+ in any way.
All in all, there seems to be no consistent and clear document for PowerShell 7+ addressing the issue raised by @WilliamXieMSFT. If I'm mistaken, please share the links or appropriate quotes from already linked documents.
Edit: And just as I have posted, I have found some relevant mention in the document I have linked - in a section for PowerShell Core, below the one I linked before. It does explain how BOM works for all 3 options:
utf8
: Encodes in UTF-8 format (no BOM).utf8BOM
: Encodes in UTF-8 format with Byte Order Mark (BOM)utf8NoBOM
: Encodes in UTF-8 format without Byte Order Mark (BOM)
I fail to understand why documentation for Out-File
(and others, e.g. Export-Csv
) can't also be clear on that, i.e. that the option utf8
does not use BOM for PowerShell Core (and it did use BOM for Windows PowerShell). Esp. that this option apparently underwent an important change.
hello. I try to convert a file from UTF-8 to UTF-8 BOM, and the code in Powershell is not working, gat an error that say:
"Unable to match the identifier name utf8NoBOM to a valid enumerator name. Specify one of the following enumerator names and try again: Unknown, String, Unicode, Byte, BigEndianUnicode, UTF8, UTF7, UTF32, Ascii, Default, Oem, BigEndianUTF32""
This is the powershell code:
$a = "C:/Folder1/TEST_ro.txt"
$b = "C:/Folder1/TEST_ro-2.txt"
(Get-Content -path $a) | Set-Content -Encoding UTF8BOM -Path $b
@me-suzy PowerShell 6+ Supports the following encodings:
Windows PowerShell 5.1 (and earlier) supports:
Note that this does not include utf8NoBOM.
ok, I search on internet, and I find 2 SOLUTIONS. Very easy, one by using REGEX , the other is done by Python Script, just using Notepad++
To fully answer the question (by directly testing the following commands in powershell.exe
and pwsh.exe
on my system):
$data=@{}
$data.Foo='Foo'
$data.Bar='Bar'
$data | ConvertTo-Json | Out-File 'c:\temp\data.json' -Encoding utf8
In Windows Powershell (version 5.1) this writes the file with BOM. In Powershell Core (version 7.2.1) this writes the file without BOM.
In the current documentation for the Set-Content and Add-Content commands, it is still not clear, what exactly "utf8" means (with or without BOM).
utf8: Encodes in UTF-8 format. utf8BOM: Encodes in UTF-8 format with Byte Order Mark (BOM) utf8NoBOM: Encodes in UTF-8 format without Byte Order Mark (BOM)
Thanks to the comments here and after trying out, "utf8" is equivalent to "utf8NoBOM" - a little hint for that behavior would be really helpful in those documentations. Especially because the behavior changed between PowerShell 5.1 and PowerShell 6.
UTF8: Encodes in UTF-8 format. UTF8BOM: Encodes in UTF-8 format with Byte Order Mark (BOM) UTF8NoBOM: Encodes in UTF-8 format without Byte Order Mark (BOM)
For UTF8, I assume this is UTF8NoBOM? Can there be some clarifying text around this?
Would it be possible to add a helpful note around the change for defaults? PS5.1 (ASCII) to PS6 (UTF8NoBOM)?
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.