BdR76 / CSVLint

CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
GNU General Public License v3.0
151 stars 8 forks source link
csv csv-to-sql data-cleaning data-quality data-wrangling datasets fixed-width metadata mysql notepad-plus-plus plugin postgresql quality sorting sql syntax-highlighting tabular-data validate validation validator

CSV Lint - Notepad++ plugin

Release version GitHub all releases GitHub latest release Donate

CSV Lint plug-in for Notepad++ adds syntax highlighting to comma-separated values (csv) and fixed width data files. It automatically detects column datatypes, can detect data formatting errors and adjust datetime and decimal formats.

preview screenshot

You can use CSV Lint to analyse, validate, convert and reformat text data files. It is not meant to be a replacement for spreadsheet programs like Excel or SPSS, but rather it's a quality control tool to examine, verify or polish up a dataset before further processing.

With CSV Lint you can take a dataset and:

CSV Lint doesn't require an internet connection and doesn't use any cloud service. All data processing is done offline on the pc that runs Notepad++. The plug-in is stable and works for most use-cases, if you encounter any bugs or unexpected behaviour feel free to report any issues here. CSVLint is based on a prototype project Dataset MultiTool

If you find the CSV Lint plug-in useful you can buy me a coffee!
paypal

How to install

The CSV Lint plugin is available in Notepad++ in the Plugins Admin menu.

Note: the plug-in is available in the Plugins Admin menu starting with Notepad++ v8.1.9.1. If you have a Notepad++ version older than v8.1.9.1 or want to install the plug-in manually:

How to use it

  1. Open your dataset in Notepad++
  2. Open the "CSV Lint window" from the plug-in menu or toolbar
  3. Press "Detect columns" to automatically detect format
  4. Optionally, manually enter or adjust metadata
  5. Press "Validate data" to detect any data errors

If there are no errors in the data, you can click "Reformat data" for data reformatting options, or select "Convert to SQL" menu item to generate an SQL insert script.

Also see this quick tour video, which shows how the plug-in works.

Watch video, CSV Lint plug-in features oveview Watch video, CSV File: View, Add Column Colors and Sort Data using Notepad++

Schema.ini

The metadata uses the standard schema.ini format, see documentation here

When you open a csv file the plug-in try to determine the column meta data. It will first look for a schema.ini file in the same folder as the data file, and check to see if it contains a section with the filename. If the file or section doesn't exist, it will scan the data and try to infer the columns and datatypes. You can manually change the meta data and press the blue disk icon to save it to a schema.ini file in the same folder as the data file for later use.

See schema.ini example below:

[mydata.csv]
Format=TabDelimited
DateTimeFormat=dd-mm-yyyy
DecimalSymbol=.
NumberDigits=2
Col1=OrderId Integer Width 8
Col2=Price Float Width 7
Col3=PartName Text Width 50
Col4=OrderDate DateTime Width 10

Format can be TabDelimited for tabs, CSVDelimited for commas, for any other delimiter use for example Format=Delimited(;). Use FixedLength for fixed width text files and set the Width for each column.

DateTimeFormat is not case sensitive and uses dd/mm/yyyy or yyyy-mm-dd hh:nn:ss etc.

DecimalSymbol can be either . or , and CSV Lint will assume the thousand separators symbol is the opposite of the DecimalSymbol. Define the maximum decimals digits for example NumberDigits=2 for values like "1.23" or "-45.67" etc.

Validating

The plug-in can be used as a CSV validator, press "Validate data" to check the input data for technical errors, based on the metadata in the textbox on the left. The line numbers of any errors will be logged in the textbox on the right. It will check the input data for the following errors:

Roadmap/goals

The CSV Lint plugin is work-in-progress, here is list of features I want to add (strikethrough is done)

Trouble shooting / Known issues

Acknowledgements

With thanks to:

The CSV Lint plug-in couldn't have been created without their source examples, suggestions and valuable feedback. For a similar plug-in for Visual Studio Code check out Rainbow CSV.

Disclaimer

This software is free-to-use and it is provided as-is without warranty of any kind.
Always back-up your data files to prevent data loss.
All test files, examples and screenshots provided in this github repository contain fictitious and randomly generated data, any resemblance to real-life cases is the result of chance.

BdR©2019-2024 Free to use - send questions or comments: Bas de Reuver - bdr1976@gmail.com