codeinthehole / csvfilter

Command-line tool for manipulating CSV data
http://codeinthehole.com/writing/csvfilter-a-python-command-line-tool-for-manipulating-csv-data/
MIT License
75 stars 15 forks source link

Error with NULL Bytes? #8

Open edent opened 9 years ago

edent commented 9 years ago

I'm trying to process a large CSV file with cat in.csv | csvfilter -f 0 > out.csv

Halfway through the file, it fails with this error:

Traceback (most recent call last):
  File "/usr/local/bin/csvfilter", line 66, in <module>
    main(options, args)
  File "/usr/local/bin/csvfilter", line 13, in main
    pump(processor, infile, writer)
  File "/usr/local/bin/csvfilter", line 36, in pump
    for output in processor.process(infile):
  File "/usr/local/lib/python2.7/dist-packages/csvfilter/__init__.py", line 23, in process
    for row in reader:
_csv.Error: line contains NULL byte

(The line in question contains a very specific DNS record)

Is there a way to get csvfilter to ignore errors like this - or should I pre-emptively correct my .csv files?

codeinthehole commented 8 years ago

It might be possible - that null byte thing seems to be a common issue with the csv module. Don't suppose you can supply a sample file that causes this error?

edent commented 8 years ago

Here's a CSV with 1,2[null],3

test.csv.txt

You can try this yourself with:

echo -e "1,2\0,3" > test.csv Then

cat test.csv | csvfilter -f 0 > out3.csv

Traceback (most recent call last):
  File "/usr/local/bin/csvfilter", line 66, in <module>
    main(options, args)
  File "/usr/local/bin/csvfilter", line 13, in main
    pump(processor, infile, writer)
  File "/usr/local/bin/csvfilter", line 36, in pump
    for output in processor.process(infile):
  File "/usr/local/lib/python2.7/dist-packages/csvfilter/__init__.py", line 23, in process
    for row in reader:
_csv.Error: line contains NULL byte