adaltas / node-csv

Full featured CSV parser with simple api and tested against large datasets.
https://csv.js.org
MIT License
4.05k stars 267 forks source link

Escape comma inside a column #346

Open jkevinturado opened 2 years ago

jkevinturado commented 2 years ago

Hi,

My raw data got a comma from a certain field, While I'm parsing the data it cause me to have a new column instead. How do you escape a comma inside a column to prevent this?

Here is a example of raw csv data

"test@email.org, test2@mail.com",email.org,none,bad now the data returned from me is

[test@email.org,test2@mail.com,email.org,none,bad]

What I'm expecting is

["test@email.org,test2@mail.com",email.org,none,bad]

I tried to escape the comma by escape: '\\,' but it doesn't help

wdavidw commented 2 years ago

My understanding is: "test@email.org, test2@mail.com",email.org,none,bad is the source and you expect ["test@email.org,test2@mail.com",email.org,none,bad], right ?

This seems like the default behavior to me. I checked and can confirm with our online tool

image image
jkevinturado commented 2 years ago

Yes that is correct, is this a default config?

jkevinturado commented 2 years ago

Hi,

I am escaping the double quotes and I need to escape that because some of the fields do have double quotes inside the field for example:

image

So this is my config btw:

image

jkevinturado commented 2 years ago

Hi, I'm using the default config.

Now the error is: Invalid Closing Quote: got "d" at line 1 instead of delimiter, record delimiter, trimable character (if activated) or comment

Because I have this double quote inside a field: image

wdavidw commented 2 years ago

The screenshot I shared use the default config.

wdavidw commented 2 years ago

Please share a minimalist and complete example for additionnal help.

jkevinturado commented 2 years ago

Hi,

I was using the default config, I'm okay with the first issue which has the comma inside, now I have double quotes inside the field and it says image

Here is the input please focus on the 2nd line which has the double quotes inside the field

image

wdavidw commented 2 years ago

Honestly, I don't think there is much we can do to handle such a scenario. The first quote place the parser in quoting mode and second quote is not escape which makes the field invalid

jkevinturado commented 2 years ago

I can actually escape the comma, using the option escape. Can I escape multiple characters?

I've tried escape: ',"' but I didn't work

wdavidw commented 2 years ago

Yes, escape may be composed of multiple characters: https://github.com/adaltas/node-csv/blob/master/packages/csv-parse/test/option.escape.coffee#L69-L79

jkevinturado commented 2 years ago

how?

wdavidw commented 2 years ago

Look at the tests with the link above, I based my answer on them.

jkevinturado commented 2 years ago

it didn't work, I think the link above you've provided will work if the character is contiguous.

Mine have to escape comma and double quotes just like this escape: ',"' unfortunately this doesn't work also as well

patelkhushal18 commented 1 week ago

For me, I had to escape both , and \n. I ended up removing commas and \n inside matching double quotes before parsing:

// remove commas and newlines inside double-quoted fields
const cleanedData = data.replace(/"([^"]*?)"/g, match => match.replace(/[,\n]/g, ''));

const records = csvParse.parse(cleanedData, {
  columns: true,      // To treat first row as headers
  relax_quotes: true, // Allow quotes within quotes
});