BdR76 / CSVLint

CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
GNU General Public License v3.0
151 stars 8 forks source link

Validate data : not a valid integer value #72

Closed PenguinMaster closed 1 year ago

PenguinMaster commented 1 year ago

One of my columns is a varchar(5) that is most of the time only numbers, and then in some cases (250 out of 35k) 2AXXX or 2BXXX

Problem : ** error line 698: Column 6 value "2A041" not a valid integer value

For some reason the program absolutely wants this column to be an integer, but I need the quote marks...

Could someone help me please ?

BdR76 commented 1 year ago

This is because by default the automatic column detection allows for 1% errors. Meaning that if one column contains values of different data types (text, integer, datetime etc) but one of those datatypes appears less than 1%, then basically the plugin will guess only based on the other 99%.

To fix your case, you can open the CSV Lint window (go to Plugins > CSV Lint > CSV Lint window) and look at the metadata text box in the bottom left, there should be a line like this:

Col6=MyColumnName Integer Width 5

(the name "MyColumnName" will be something else) and you can manually edit this line, remove Integer and type Text instead, so:

Col6=MyColumnName Text Width 5

Leave the rest of the lines unchanged and then press the blue disk icon to save the metadata. Then press the "Validate data" button again and it should work.

Alternatively you can go to the CSV Lint settings and change ErrorTolerance from 1 to 0, press OK to save settings, and close and open your csv file. That way the automatic detection will see column 6 as Text not Integer. The downside to changing this setting is that the automatic detection will not detect potential error values where there's less than 1% of values with a different datatype in any csv file,

PenguinMaster commented 1 year ago

Thanks !