BdR76 / CSVLint

CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
GNU General Public License v3.0
151 stars 8 forks source link

Unwanted outcome to Reformat - deleted spaces #51

Closed AFalut closed 1 year ago

AFalut commented 1 year ago

When using Reformat to add column separators to a text file such as Example.txt, spaces adjacent to a separator are deleted even if “Trim all values” was not selected, and no matter the “Re-apply quotes” option selected (please note that the “Value with spaces” option seems to apply quotes to the full line, rather than quotes framing the values containing spaces). For Example.txt and the schema.ini metadata below, this results in Example_reformatted_actual.txt. We would like instead to reformat while keeping all spaces, including those adjacent to a separator (since they are significant in our case), as in Example_reformatted_target.txt.

[Example.txt]
Format=FixedLength
ColNameHeader=False
Col1=FIELD1 Integer Width 4
Col2=FIELD2 Integer Width 4
Col3=FIELD3 Integer Width 3
Col4=FIELD4 Text Width 4
Col5=FIELD5 Text Width 1
Col6=FIELD6 Text Width 2
Col7=FIELD7 Text Width 4

Example.txt Example_reformatted_actual.txt Example_reformatted_target.txt

BdR76 commented 1 year ago

Thanks for the detailed feedback and example data files. Is this for an actual use-case you've encountered? The zeroes and ones looks like just a technical test file.

The plugin assumes that values should always be trimmed when it's a fixed width file (Format=FixedLength). Because, by definition, it will always contain extra spaces. See the code on these lines.

There's a TrimValues setting but it's only applied to csv data. By default it's set to true and in practice that almost always what you'd want anyway. When set to false, the downside is that it won't interpret the column datatypes correctly. For example values like "123 " and " 45 " (without quotes but with spaces) will be interperted as text, not integers.

I'll see what I can do in the next verion. I think I'll remove the Trim-checkbox on the reformat dialog and merge that functionality with the general TrimValues setting.

BdR76 commented 1 year ago

I've changed the Reformat Dialog to always use the Trim Values setting, also for fixed width files (see commit here). Indirectly this also applies to the Sort, Add column dialogs. It will be available in the next release, and you can preview it in the updated DLL here

AFalut commented 1 year ago

First of all, thank you for the very fast answers and commit! After unchecking “Trim values” in the new version of the “Reformat data” dialog box, we’re now getting the results we were hoping for. To further elaborate on our use case, we are part of the Department of data administration of the French National archives, and we need to verify that archival data contained in unseparated text files is compliant with information we have on their structure, chiefly data order and lengths (i.e. column widths after reformatting). As the plug-in is proving quite valuable for this use, we'll be keeping a close eye on the project. The files I attached are fictional examples, but similar to the contents of our actual files, though they are usually more complex.

BdR76 commented 1 year ago

The latest release v0.4.6.3 includes this fix, and it will be automatically available in the next update of Notepad++