Open codybaxter opened 8 years ago
This rule is problematic in today's world of programming, where different developers are increasingly likely to have different default system encodings. Often, a document starts out containing only characters which are encoded the same way in multiple encodings (e.g. much English-language text is the same in Windows-1252 and UTF-8). While the document contains valid UTF-8 content, nothing stops Visual Studio (or some other editor) from opening the file as Windows-1252.
The suggested rule would catch cases where the document content contains byte sequences that are not valid UTF-8 sequences, but it would generally fail to instruct editors to open most documents as UTF-8. It would also fail to recognize cases where an editor saved a file in another encoding which is technically a valid UTF-8 byte sequence but did not preserve the original meaning of the characters.
The primary reason why it would make sense to leave out a BOM would be cases where editors, compilers, and/or libraries fail to correctly handle a BOM (notably the IO framework for Java passes the BOM through as a character). This is not the case in the world of C# code.
I would vote :-1: on the ability to enforce policies that remove a BOM, as it directly works against the more important goal of supporting developers working in a variety of local cultures.
Should this be reconsidered? Examples of changes since last comments:
Therefore, it seems a rule that ensures UTF-8 files exclude the BOM would be beneficial, especially where a project uses or creates files that are also used by (for example) JAVA projects.
It would be nice to have a rule that is similar to SA1412 but checks for file encoding of UTF-8 without BOM.