adaltas / node-csv

Full featured CSV parser with simple api and tested against large datasets.
https://csv.js.org
MIT License
4.05k stars 267 forks source link

When `bom` and `skipRecordsWithError` skip event is not raised for skipped records #411

Closed omerlh closed 5 months ago

omerlh commented 11 months ago

Describe the bug

Setting both options results in csv file parsed correctly but no event is fired for skipped lines.

To Reproduce

import { createReadStream } from 'node:fs';
import { finished } from 'node:stream/promises';
import { parse } from 'csv-parse';

const parser = parse({
  bom: true,
  skipRecordsWithError: true,
});

const outStream = createReadStream('tests/closed-box/fixtures/data-with-bom.csv').pipe(parser);

outStream.on('skip', (e) => {
  console.error(e);
});

await finished(outStream);

I would expect the console error to be called but it doesn't. When removing the bom option and using something like strip-bom-stream the event is fired correctly.

Additional context data-with-bom.csv

alexkorsun commented 5 months ago

Any update on this issue?

wdavidw commented 5 months ago

Please look at this commit. I have tried to reproduce your error but it seems to run fine, catch the error passed in the "skip" event.

alexkorsun commented 5 months ago

There is a code that reproduces the issue on my side.

import path from 'path';
import { pipeline } from 'stream/promises';
import { parse as parseCSV } from 'csv-parse';
import { Writable } from 'stream';
import { createReadStream } from 'fs';

async function testRecordsSkip() {
  const errors: any = [];
  const records: any = [];

  const sink = new Writable({
    objectMode: true,
    write: (_, __, callback) => {
      records.push(_);
      callback();
    },
  });

  const csvSource = createReadStream(path.join(__dirname, 'test_with_bom.csv'));
  const parser = parseCSV({
    skip_records_with_error: true,
    bom: true,
  });
  parser.on('skip', function (err) {
    errors.push(err);
  });

  await pipeline(csvSource, parser, sink);

  console.log({
    records,
    errors,
  });
}

testRecordsSkip().catch(console.error);

The output is

{
  records: [
    [ 'id', 'first_name', 'last_name', 'email', 'modified_at' ],
    [ '1', 'Ring', 'Grinyov', 'rgrinyov0@weebly.com', '2022-02-14' ],
    [ '3', 'Cammi', 'Bendix', 'cbendix2@tuttocitta.it', '2022-02-14' ]
  ],
  errors: []
}

Test file is there test_with_bom.csv

wdavidw commented 5 months ago

I tried again with your code and the output is still what we expect:

{
  records: [
    [ 'id', 'first_name', 'last_name', 'email', 'modified_at' ],
    [ '1', 'Ring', 'Grinyov', 'rgrinyov0@weebly.com', '2022-02-14' ],
    [ '3', 'Cammi', 'Bendix', 'cbendix2@tuttocitta.it', '2022-02-14' ]
  ],
  errors: [
    CsvError: Invalid Record Length: expect 5, got 6 on line 3
        at Object.__onRecord (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/api/index.js:364:11)
        at Object.parse (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/api/index.js:247:40)
        at Parser._transform (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/index.js:31:26)
        at Transform._write (node:internal/streams/transform:175:8)
        at writeOrBuffer (node:internal/streams/writable:392:12)
        at _write (node:internal/streams/writable:333:10)
        at Writable.write (node:internal/streams/writable:337:10)
        at ReadStream.ondata (node:internal/streams/readable:766:22)
        at ReadStream.emit (node:events:514:28)
        at addChunk (node:internal/streams/readable:324:12) {
      code: 'CSV_RECORD_INCONSISTENT_FIELDS_LENGTH',
      bytes: 141,
      comment_lines: 0,
      empty_lines: 0,
      invalid_field_length: 0,
      lines: 3,
      records: 2,
      columns: false,
      error: undefined,
      header: false,
      index: 6,
      raw: undefined,
      column: 6,
      quoting: false,
      record: [Array]
    }
  ]
}

Note, tested with Node.js versions "v16.13.0", "v18.17.1", "v20.5.1" and "v21.5.0".

alexkorsun commented 5 months ago

I tried again with your code and the output is still what we expect:

{
  records: [
    [ 'id', 'first_name', 'last_name', 'email', 'modified_at' ],
    [ '1', 'Ring', 'Grinyov', 'rgrinyov0@weebly.com', '2022-02-14' ],
    [ '3', 'Cammi', 'Bendix', 'cbendix2@tuttocitta.it', '2022-02-14' ]
  ],
  errors: [
    CsvError: Invalid Record Length: expect 5, got 6 on line 3
        at Object.__onRecord (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/api/index.js:364:11)
        at Object.parse (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/api/index.js:247:40)
        at Parser._transform (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/index.js:31:26)
        at Transform._write (node:internal/streams/transform:175:8)
        at writeOrBuffer (node:internal/streams/writable:392:12)
        at _write (node:internal/streams/writable:333:10)
        at Writable.write (node:internal/streams/writable:337:10)
        at ReadStream.ondata (node:internal/streams/readable:766:22)
        at ReadStream.emit (node:events:514:28)
        at addChunk (node:internal/streams/readable:324:12) {
      code: 'CSV_RECORD_INCONSISTENT_FIELDS_LENGTH',
      bytes: 141,
      comment_lines: 0,
      empty_lines: 0,
      invalid_field_length: 0,
      lines: 3,
      records: 2,
      columns: false,
      error: undefined,
      header: false,
      index: 6,
      raw: undefined,
      column: 6,
      quoting: false,
      record: [Array]
    }
  ]
}

Note, tested with Node.js versions "v16.13.0", "v18.17.1", "v20.5.1" and "v21.5.0".

I replaced your file with the one I shared and it doesn't work, but when I test it with the one from the link you shared, it works.

When I check the files, I see this, meaning that your file doesn't contain a BOM?

alex@undefined:~/WebstormProjects/test$ cat src/411.csv | file -
/dev/stdin: ASCII text
alex@undefined:~/WebstormProjects/test$ cat src/test_with_bom.csv | file -
/dev/stdin: Unicode text, UTF-8 (with BOM) text
alexkorsun commented 5 months ago

So there is a script that should reproduce an issue on your side

import assert from 'node:assert';
import { Writable } from 'node:stream';
import { parse } from 'csv-parse';
import { finished } from 'stream/promises';
import { Readable } from 'stream';

async function a() {
  const errors: any = [];
  const parser = parse({
    bom: true,
    skipRecordsWithError: true,
  });
  // Create a stream and consume its source
  const sink = new Writable({
    objectMode: true,
    write: (_, __, callback) => callback(),
  });

  const csv = `\ufeffkey,value
1,value-1
2,value-2
3,"value-3
one more line
and one more line"`;

  const csvSource = Readable.from([csv]);
  const outStream = csvSource.pipe(parser).pipe(sink);
  // Catch records with errors
  parser.on('skip', (e) => {
    errors.push(e);
  });
  // Wait for stream to be consumed
  await finished(outStream);
  // Catch error from skip event
  assert.equal(errors.length, 1);
}

a().catch(console.error);
wdavidw commented 5 months ago

My bad, I forgot that the issue was associated with the bom. Version "5.5.6" of csv-parse fixes the issue.