Byte Order Mark characters included in output?

goodby / csv

Goodby CSV is a high memory efficient flexible and extendable open-source CSV import/export library for PHP 5.3. 1. Memory Management Free This library designed for memory unbreakable. It will not be accumulated in the memory whole rows. The importer read CSV file and execute callback function line by line. 2. Multibyte support This library supports mulitbyte input/output: for example, SJIS-win, EUC-JP and UTF-8. 3. Ready to Use for Enterprise Applications Goodby CSV is fully unit-tested. The library is stable and ready to be used in large projects like enterprise applications.

MIT License

955 stars 148 forks source link

The csv-file I am importing is encoded in UTF-8 and thus it startes with the byte order "EF BB BF" or "ï»¿" when decoded. (see http://de.wikipedia.org/wiki/Byte_Order_Mark)

These are non-print characters and generally don't show up in the output. However, it can make a difference if you are making a string comparison.

For example, my first column in the first row looks like this:

array(12) {
  [0]=>
  string(16) "location-ID"
  [1]=>
  string(5) "value"
  [2]=>
    ...

As you can see the character count is a bit off because of the non-print-characters. Other columns are not affected. Only the very first column in the very first row shows this behaviour.

I've tried several different config-options (->setToCharset('UTF-8') etc.) in order to quash those unwanted characters, but none did work.

My csv-file contains several special characters like äöü or ß which are all displayed correctly, so I am positive that the input is decoded correctly.

It is not a big deal to manually remove those unwanted characters in the interpreter, but I was wondering if this was a bug in goodby/csv.

goodby / csv

Byte Order Mark characters included in output? #33