Open BillyTom opened 10 years ago
In a similar fashion, I am looking for support to generate the BOM characters when exporting. Those three characters seem to be the only way to tell MS Excel what encoding the file uses. I'll raise it as a separate issue when I have more details, but just noting it here so it does not get lost. To export setFromCharset() could be used to set what the (optional) BOM looks like and should not need to be paired up with a setToCharset() if no conversion is needed.
The csv-file I am importing is encoded in UTF-8 and thus it startes with the byte order "EF BB BF" or "" when decoded. (see http://de.wikipedia.org/wiki/Byte_Order_Mark)
These are non-print characters and generally don't show up in the output. However, it can make a difference if you are making a string comparison.
For example, my first column in the first row looks like this:
As you can see the character count is a bit off because of the non-print-characters. Other columns are not affected. Only the very first column in the very first row shows this behaviour.
I've tried several different config-options (->setToCharset('UTF-8') etc.) in order to quash those unwanted characters, but none did work.
My csv-file contains several special characters like äöü or ß which are all displayed correctly, so I am positive that the input is decoded correctly.
It is not a big deal to manually remove those unwanted characters in the interpreter, but I was wondering if this was a bug in goodby/csv.