adaltas / node-csv-parse

CSV parsing implementing the Node.js `stream.Transform` API
https://csv.js.org/parse/
804 stars 166 forks source link

"," inside a Cell #192

Closed axi92 closed 5 years ago

axi92 commented 6 years ago

I have this set of data:

"Labirinto, "a" Schoenbrunn",48.182497,16.309359,2,E
""Mann" und Frau",48.178537,16.264381,4,R

There are 2 Problems, the " inside the "" and the , inside the ""

The first one is working with quote: null, But I tried to set the delimiter: null but that did not work...

quote: null,
delimiter: null,
cast: function (value, information) {
    console.log(value);
    console.log(information);
    if(information.column == 0)
    return value;
}

Output:

"Labirinto
{ quoting: false,
  lines: 0,
  count: 0,
  index: 0,
  header: false,
  column: 0 }
 "a" Schoenbrunn"
{ quoting: false,
  lines: 0,
  count: 0,
  index: 1,
  header: false,
  column: 1 }
48.182497
{ quoting: false,
  lines: 0,
  count: 0,
  index: 2,
  header: false,
  column: 2 }
16.309359
{ quoting: false,
  lines: 0,
  count: 0,
  index: 3,
  header: false,
  column: 3 }
2
{ quoting: false,
  lines: 0,
  count: 0,
  index: 4,
  header: false,
  column: 4 }
E
{ quoting: false,
  lines: 1,
  count: 0,
  index: 5,
  header: false,
  column: 5 }
"Mann" und Frau"
{ quoting: false,
  lines: 1,
  count: 1,
  index: 0,
  header: false,
  column: 0 }
48.178537
{ quoting: false,
  lines: 1,
  count: 1,
  index: 1,
  header: false,
  column: 1 }
16.264381
{ quoting: false,
  lines: 1,
  count: 1,
  index: 2,
  header: false,
  column: 2 }
4
{ quoting: false,
  lines: 1,
  count: 1,
  index: 3,
  header: false,
  column: 3 }
R
{ quoting: false,
  lines: 2,
  count: 1,
  index: 4,
  header: false,
  column: 4 }
Error: Number of columns is inconsistent on line 2
    at Parser.error (D:\Git\iitc_multiexport_parser\node_modules\csv-parse\lib\index.js:688:9)
    at Parser.__push (D:\Git\iitc_multiexport_parser\node_modules\csv-parse\lib\index.js:333:18)
    at Parser.__write (D:\Git\iitc_multiexport_parser\node_modules\csv-parse\lib\index.js:621:22)
    at Immediate.setImmediate (D:\Git\iitc_multiexport_parser\node_modules\csv-parse\lib\index.js:263:16)
    at runCallback (timers.js:810:20)
    at tryOnImmediate (timers.js:768:5)
    at processImmediate [as _immediateCallback] (timers.js:745:5)
wdavidw commented 6 years ago

Well, you could set the relax_column_count option, .... in french we would say "reculer pour mieux sauter" which translates like "to take a step back to better jump into more mess". Dont know what to say. Of course, a solution would be to implement a rule like treat start quotes only if after delimiter and end quotes only before delimiter but it would be very hard to implement, very dirty for a theorically non valid CSV data (I wrote theorically because CSV lack some specifications). In your case, since you seems to be able to rely on a rule like "only my first column could contains quotes", a quick and dirty fix would be to reconsolidate your data after parsing:

parse = require 'csv-parse'
parse """
"Labirinto, "a" Schoenbrunn",48.182497,16.309359,2,E
""Mann" und Frau",48.178537,16.264381,4,R
""",
  quote: null
  delimiter: ','
  relax_column_count: true
, (err, records) ->
  records = for record in records
    [record.slice(0, record.length - 4).join(''), record.slice(record.length - 4)...]
  console.log records

Print

[
  [ '"Labirinto "a" Schoenbrunn"', '48.182497', '16.309359', '2', 'E' ],
  [ '"Mann" und Frau"', '48.178537', '16.264381', '4', 'R' ]
 ]
wdavidw commented 5 years ago

Closing due to lack of activity.