BdR76 / CSVLint

CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
GNU General Public License v3.0
151 stars 8 forks source link

Make Reformat keep the quotes as-is like in the original file #57

Open artyb55 opened 1 year ago

artyb55 commented 1 year ago

Hi, Thank you for working on this plugin for Notepad++.

I'm trying to edit CSV files that aren’t compliant with RFC 4180 CSV rules, and I can’t change their format. They have some rows of differing lengths, and some fields are enclosed in quotes even though they don’t strictly have to be according to the standard. The files also have 4 header rows. I’ve been searching for an editor that will allow me to move some text to the right (by adding extra delimiters) so that info in the header is kept when I delete columns, and then align the delimiters so I can delete some columns, WITHOUT changing the formatting of the CSV file. I don’t know that behaviour could be added as options to this plugin?

I’ve tried LOTS of editors and only found one commercial piece of software that does what I need. I've been surprised how difficult it's been to find a way of doing this-spreadsheets mangle the quotes, and I can't find a text editor that will properly align the delimiters and keep the quotes.

What I'm doing is importing the file (which doesn't natively have a .csv extension). I'm then using the manual 'Detect Columns' to specify the delimiter. Ideally the plugin would be smart enough to not detect delimiters within quotes.

My next step is to use the 'Reformat' and 'Align vertically' options, at which point all the quote characters are removed from the file. None of the 'Re-apply quotes' options put them back everywhere they were. An option to not remove any quotes would be great. Datalogger files often have more than one line of headers as they store other info on the configuration. These files have 4 lines of headers, and the 'Align vertically' doesn't add enough space to all rows that they are properly aligned. #46 may relate to this (Autodetection of columns not working for more-than-one-row header).

Here's a screenshot showing the removal of quotes and vertical alignment issues: 2023-03-31 10_23_32-CSV_Editors_CSV_Lint_inNotepad++_AlignmentIssues

What I'd hoped to do as the final steps was to use Notepad++'s multi-row editing to delete columns (although I see there is an enhancement to allow that in CSV-lint #54).

Last step was to be to remove the 'Align vertical', and save the modified file.

I've attached a sample file, and what I'd like it to look like after editing (these both have the file extension changed to .csv so I can upload them here.

BW, A

CSV_Lint_inNotepad++_ExampleDataFile__2023_02_28AB1.csv

CSV_Lint_inNotepad++_ExampleDataFile_Edited__2023_03_30AB1.csv

Friedi commented 1 year ago

I see your data is too inconsistent (header is multiple length but also not equal columns; and also you have strings containing a delimiter, makes the approach I propose down more difficult) there is a option to skip x line from the header but only for coloring not for manipulation of columns I think.

I would suggest you use excel or a similar spreadsheet tool (maybe libre office) for that structural manipulation which are much better for that purpose (this plugin is a simple visualization help). After that you can save as csv again

If your header wasn't that special you could do this with notepad++ it has a column editing mode. (either you exclude/skip the header or you equalize the columns number) You only need this plugin's reformat option "align vertically"(but be aware that this is not recommended as there is a bug changing the text encoding #52 - I only use it for better visibility (not to save the data) OR replace your delimiter with a delimiter+x spaces (if you do not have strings containing a delimiter, but you do...)

go to the column where you want to add or delete a delimiter-column, press + and mark the lines with cursor down or up OR press + and left click to mark the columns+lines with the mouse. once marked you see a large cursor line and you can add or delete columns/chars. you could also record this as a macro and apply it to a short cut.

BdR76 commented 1 year ago

Thanks for posting your question and the example files. The plugin is already smart enough to detect delimiters within quotes and handle them properly. But the problem with your file is the weird header lines. The first line has 8 column, the rest have 43 column. Without that first line it's a normal CSV file. So what you could do is:

1) Cut ctrl-X the entire first line 2) then auto-detect columns, this should work now (press validate gives no errors) 3) select Reformat and Align vertically 4) Do the edits using the column-edit mode like @Friedi suggested 5) then Reformat and Trim all values 6) Paste ctrl+V the first line back at the top

Then the only remaining problem is the quotes. However in your case it's only the first column so you could add the quotes back using manually using the native Notepad++ column editing feature (=shift + alt + arrow up/down and then type something)

But an enhancement to the plugin could be; keep the quotes as-is like in the original data. This could be done but it requires a big rewrite in the way the plugin parses the data.

BdR76 commented 1 year ago

Btw the development version of the plugin has support for skipping lines, i.e. treat the first line as not part of the data, see issue #46 and also I would appreciate an upvote on this Microsoft feedback

Anyway, this is not yet in the currently released version, but if you download the "beta" build DLL you can select "Manually detect columns" and then select Skip lines=1 and in the previous post you can then skip steps 1, 2 and 6.