Quoted first header in a stream with a UTF-8 BOM is not handled nicely

airbreather / Cursively

A CSV reader for .NET. Fast, RFC 4180 compliant, and fault tolerant. UTF-8 only.

MIT License

38 stars 2 forks source link

Quoted first header in a stream with a UTF-8 BOM is not handled nicely #14

Open airbreather opened 5 years ago

airbreather commented 5 years ago

The visitor optionally lets us ignore a UTF-8 BOM on the first header field if present, however, if that field starts with a double-quote, then the tokenizer will fail to treat it as quoted.

The good news is that with the new model I'm doing for #12, all the CsvInput implementations could be the ones that optionally ignore a UTF-8 BOM if present.

airbreather commented 5 years ago

At the time of this comment, the top result for a Google search for the term "FEFF", on its own, is a blog post of someone having this exact same problem with Ruby's CSV parser. Cursively is in good company.