Open richlander opened 6 months ago
I think it's a bit up to the end-user. In our company, we use the standard that all text files in our repositories are UTF-8, no-BOM, LF, with a final newline at the end. I personally think that's a good standard.
UTF-8, no-BOM, LF, with a final newline at the end
Are you saying that your files start with the linefeed character? Can you elaborate on that?
Apologies for the confusion. I meant that our files use a linefeed character as line terminator.
I think that it is good to use utf-8-bom as default in template code files for C#, VB and F#. The reasoning behind this is that Visual Studio(17.10.1) might use "wrong" encoding otherwise(Windows-1252 for example). I think that the default behaviour in VS should be changed to use utf-8 if BOM is missing. But as long as this is not the case, having the BOM is good for the following reasons:
1/ When opening some template code file that does not have a BOM in Visual Studio, it does not default to utf8. This will cause Visual Studio to raise the following error if characters that could not be saved using the current code page are added: https://github.com/dotnet/test-templates/issues/358
2/ But more important, there is a possibility that you get different behavior of your program when running on different systems if the file is not saved using utf8 or utf8-bom. https://github.com/dotnet/test-templates/issues/358
Also see this comment: https://github.com/dotnet/format/issues/1893#issuecomment-1946428275
In general, I think that using utf-8-bom for template code files is the best considering visual studios current encoding behaviour.
From the Unicode spec.
Visual Studio by default choose the "wrong" encoding when opening template files stored without BOM. This will lead to several problems (https://github.com/dotnet/sdk/issues/39187#issuecomment-2147146329)
I think that the BOM helps Visual Studio to "guess" the correct encoding. Using the BOM as a signature seems ok according to the specification:
"Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature"
If Visual Studio changes to always default to UTF8, omitting the BOM would be fine. But until then, keeping the BOM would be the best.
From the Unicode spec.
A bit off-topic, but keep in mind that an image is not strictly readable. I've spent a few minutes baffled why you only commented: "From the unicode spec." and nothing else. I only later realised that you'd attached an image containing the text. I'm not (substantially) vision impaired, but my default e-mail set-up is plain-text and doesn't render images. Some people will not have the option to read images.
@bjornen77 Yeah, I think that this is the way to go. The templates should probably be most accessible to newcomers that expect a tutorial written for Visual Studio to work. For me fixing a repo because it's generated with wrong BOM usage is just a single command anyway.
It is unclear to me that there is any value in including these 3 bytes.
I wrote a quick program to demonstrate this:
What it produces:
What I see with
cat
:See the leading space?
C# files have the same problem.
I also see the following in
vim
, which I use frequently for small edits.It would be great to define guidance if we should include BOMs in any UTF8 files (C#, csproj, ...) by default. Hopefully not.