Closed SoftTools59654 closed 6 months ago
It sounds like you're asking about fixing certain data errors in a specific data file, is that correct?
How to fix it depends on your use-case. Should it ignore the column values, replace values with something else, remove the entire row, what value is too low or too high, apply different conditions for different columns etc. That would require too much customisation and goes beyond adding a new option in this plug-in.
I think the best approach is to write a script (Python or other) to do what you want to do with your data file.
The CSV Lint plug-in can get you started with such a script. Open the csv file in Notepad++ and then go to the menu Plugins > CSV Lint > Generate Metadata
and select Python script
. This will generate a Python script to read the csv file and write to another csv file. However, you still need to develop and expand that script for your specific data processing/filtering and output file requirements.
Btw I can't help you develop a Python script but you can lookup a lot of things on Stackoverflow or ask ChatGPT.
Btw for large (>1GB) files, you could look at the pandas library in Python. The read_csv
function has a chunksize
parameter for processing such large files. I'm not familiar with it, but there is example code here
csv repair and troubleshooting
Is it possible to add a tool that checks the syntax of csv files and fixes those that are problematic?
In most cases, either the number is low or high, or it does not exist in that line at all. In data with millions of records, it is impossible to check manually
and convert the file to a standard csv file
Because defective and faulty files cannot be converted to other files, such as JSON