go-ini / ini

Package ini provides INI file read and write functionality in Go
https://ini.unknwon.io
Apache License 2.0
3.47k stars 374 forks source link

Please add support for UTF16LE and UTF16BE #66

Open steamonimo opened 8 years ago

steamonimo commented 8 years ago

Many ini files for Windows are using UTF16LE (Little Endian) instead of UTF8. Rarely there are even UTF16 (Big Endian) files. The Windows API calls like GetPrivateProfileString will handle all of these formats automatically. All varieties UTF8, UTF16LE and UTF16BE will have a byte order mark (BOM) at the beginning. The BOM will guide you how to convert the byte content of the file to UTF8 (go string):

UTF-8: EF BB BF (first three bytes of file) UTF16LE: FF FE (first two bytes of the file) UTF16BE: FE FF (first two bytes of the file)

For testing you can use "Save as" of Windows Notepad. There you will find UTF8, Unicode (UTF16LE) and Unicode Big Endian (UTF16BE) format.

unknwon commented 7 years ago

Is there an easy way to know the file's encoding is UTF-8 or UTF-16?

Other refs:

steamonimo commented 7 years ago

The purpose of the BOM is to give you the information about the used encoding. The idea is to scan the first bytes of the file for the sequence EFBBBF, FFFE or FEFF. If any of those are present you can assume the encoding is UTF-8, UTF16LE or UTF16BE. Without any of those Byte Order Marks present you can not be sure how the encoding is.