Data-Liberation-Front / csvlint.rb

The gem behind http://csvlint.io
MIT License
283 stars 86 forks source link

UTF-8 BOM results in whitespace error #97

Open spikeheap opened 9 years ago

spikeheap commented 9 years ago

UTF-8 files with a Byte Order Mark have the BOM passed through to the content by default in Ruby, and the result is whitespace errors reported by csvlint.

Here's an example: http://csvlint.io/validation/543526f36373760fc6020000.

The BOM only needs to be filtered from the first line in the file, e.g.:

row.delete!("\xEF\xBB\xBF")
ntkog commented 9 years ago

Another workaround is be sure to strip any BOM sequence . You can do it with strip-bom module :

Ex:

var fs = require('fs');
var stripBom = require('strip-bom');
var rs = fs.createReadStream(file);
var csvlintInstance = csvlint();
rs
.pipe(stripBom.stream())
.pipe(csvlintInstance)
  .on('error', function (errArr) {
   console.log(errArr); 
})
...

By the way, If you convert a .csv file ti UTF-8 directy from Windows notepad , you'll get one file with BOM in most cases, you can check it with:

file example.csv
example.csv: UTF-8 Unicode (with BOM) text, with CRLF line terminators