Open steamonimo opened 8 years ago
Is there an easy way to know the file's encoding is UTF-8 or UTF-16?
Other refs:
The purpose of the BOM is to give you the information about the used encoding. The idea is to scan the first bytes of the file for the sequence EFBBBF, FFFE or FEFF. If any of those are present you can assume the encoding is UTF-8, UTF16LE or UTF16BE. Without any of those Byte Order Marks present you can not be sure how the encoding is.
Many ini files for Windows are using UTF16LE (Little Endian) instead of UTF8. Rarely there are even UTF16 (Big Endian) files. The Windows API calls like GetPrivateProfileString will handle all of these formats automatically. All varieties UTF8, UTF16LE and UTF16BE will have a byte order mark (BOM) at the beginning. The BOM will guide you how to convert the byte content of the file to UTF8 (go string):
UTF-8: EF BB BF (first three bytes of file) UTF16LE: FF FE (first two bytes of the file) UTF16BE: FE FF (first two bytes of the file)
For testing you can use "Save as" of Windows Notepad. There you will find UTF8, Unicode (UTF16LE) and Unicode Big Endian (UTF16BE) format.