BdR76 / CSVLint

CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
GNU General Public License v3.0
151 stars 8 forks source link

Respect escaped commas #78

Open AlmightyLks opened 7 months ago

AlmightyLks commented 7 months ago

Would love to see this plugin respect escaped quotes and doesn't falsely format them

image

image


At the same time, I've been messing with another use-case dealing with commas and whitelisting escaped commas. I've used the following regex, and it might be applicable here 😄

(?<!\\),

image

BdR76 commented 7 months ago

Thanks for posting the suggestion. The plug-in does support escaping of quote characters by using double quote characters, see example data below

LogId,LogDate,LogTime,Type,Description
17584,28-11-2023,00:11:18.170,Error,Internal server error (500)
17585,28-11-2023,00:11:18.056,Warning,LoadDataSource process not ready
17586,28-11-2023,00:39:42.373,Error,"File ""c:\temp\labext_mcl_hb_1.csv"" not found"
17587,28-11-2023,00:51:02.831,Error,"File ""c:\temp\labext_mcl_hb_2.csv"" not found"
17588,28-11-2023,01:19:16.629,Warning,LoadDataSource process not ready

And any value that also contains the separator character (be it comma, semicolon etc.) can be escaped by adding quotes around the value as a whole, see the following example data:

PatientId,BirthDate,Sex,Lastname
1086,30-09-2002,M,Meijer
1248,19-04-1992,M,"Dijk, van"
2459,18-09-2000,M,Bakker
2499,11-05-2005,F,Visser
2907,27-10-1984,F,"Berg, van der"

As for using a slash for escaping certain characters, that is common for code like in C++/Java/Python etc. but afaik I haven't seen this being used in practice in csv data files. This would require quite some effort to include in the plugin, because both the lexer (syntax highlighting) and the parser/validator would have to be changed.

So is this an actual use-case, is there a system or data-supplier that formats the data using a slash to escape commas?

AlmightyLks commented 7 months ago

Sorry for the late reply 😄 I understand your stance, and I can't find any word about it in the CSV RFC (RFC 4180) either, so I am on the shorter end here obviously

Our decades old desktop software relies on that detail it seems, and it doesnt seem to cause problems programmatically However when trying to view and/or edit these files with a proper highlighting / formatting using CSVLint, I notice that like 1/20th of the lines fall out of format due to that And I don't think we can teach our ~brittle piece of~ software to use quotes for escaping, even if just for backwards compat. reasons That's where my use-case comes from

We can close the issue if you, understandably, don't want to pick up on it