BdR76 / CSVLint

CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
GNU General Public License v3.0
151 stars 8 forks source link

..is not a valid enumeration member, detecting enumeration should ignore certain cases #89

Closed flex567 closed 4 months ago

flex567 commented 4 months ago

After I press Validate: ** error line 6: Column 5 value "Tall car with medium long brown hood. Maybe English, special about it was it had windows like the bottom of Coca-Cola. Nice car, not very cute. Taller then my." is not a valid enumeration member

Not sure what is this error about??

BdR76 commented 4 months ago

In one of the latest versions support for enumeration was added, meaning it will try to detect coded columns that only contain values like Yes, No or Mild,Medium,Severe etc.

However, I suspect that column 5 in your dataset contains text comments where it's mostly empty, for example with only 2 or 3 non-empty values out of 100 records, then the CSV Lint plug-in will inadvertedly flag it as an enumeration column.

Currently the plug-in counts the unique values of the column, and if it's fewer than 15 (UniqueValuesMax setting) then it's flagged as enumeration column. However, I think it should also require that at least one or more values must be found more than just 1 single time.

In other words, when it finds for example 7 unique values for a column, but each value is found only once, then it should not be interpreted as a coded value but instead as plain text.

I think the detection of enumeration can be improved, I'll look into this.

flex567 commented 4 months ago

I cant replicate the issue and I don't see this error anymore.