arcnmx / serde-ini

Windows INI file {de,}serialization
MIT License
19 stars 17 forks source link

Files containing byte order mark (BOM) are not supported #14

Open darfink opened 4 years ago

darfink commented 4 years ago

Albeit BOMs are redundant for UTF-8, they are still commonly encountered. I personally tried to parse an UTF-8 file containing a BOM, which lead to time consuming troubleshooting. It would be great if whenever a potential BOM is present, it is discarded instead of erroring.

EDIT: Albeit this library may choose to ignore it since it doesn't export any direct methods for file IO.

arcnmx commented 4 years ago

Yeah, I'm not entirely sure if this is the right abstraction level, or if it should be the responsibility of a Read wrapper to handle skipping past BOMs instead (or be extra fancy and also transparently convert non-utf8). It's not a big deal to just add a check for it though...

From what I can gather, winapi's GetPrivateProfileString also chokes on a UTF8 BOM, so this isn't inconsistent. It might support UTF-16 (UCS-2?) BOMs though? Or only GetPrivateProfileStringW might, I'm not sure...

darfink commented 4 years ago

Interesting! Well depending on whether it's the library's responsibility or not, you can consider this issue open or closed. Meanwhile, I decided to use encoding_rs_io to handle files with UTF8 BOMs.