Open ismith opened 5 years ago
(Once I figured out that I needed to specify --delimiter=,
, though, this tool saved me a ton of time, so thank you!)
Thanks for the feedback - I added some information about default values (and mentioned the default delimiter).
Glad my script was helpful for you!
Just thought about your idea about adding a user friendly message, if the delimiter is wrong and the index given is therefore out of bounds:
What do you think?
Sometimes I only have one column, so this is fine and should not result in an error. Any messages written in this case would be included in my csv file when writing to stdout!
A check of the column index to be less than len(row) (line 74) might solve the problem of variable row length (some rows do have x columns, some not), but might be a problem in the case you describe: wrong delimiter, script only sees 1 column, but you want to replace column 2.
print("spam", file=sys.stderr)
would avoid that (writing to the file)! But yes, I agree that you don't want to error just on len(row) == 1. Making this part of the IndexError handling limits it to the case where you've specified a column that is out of bounds.
That said - right now, the behavior if you specify an index that the row doesn't have is a crash (due to unhandled error). So maybe the right thing is just to do something like
try:
[code that raises error, this is line 74 in csv_anonymizer.py]
except IndexError:
msg = "You wanted to anonymize field {}, but row {} has only {} fields. Possibly your data is missing fields, or '{}' is the wrong delimiter?"
raise Exception(msg.format(columnIndex, rowIndex, len(row), delimiter))
(My python is a bit rusty, I'm not 100% sure of the syntax, but this should be close.)
Note that this would require passing delimiter as a param to anonymize_rows
, and you'd want to replace for row in rows:
with something like for rowIndex, row in enumerate(rows):
(See: https://treyhunner.com/2016/04/how-to-loop-with-indexes-in-python/#What_if_we_need_indexes?)
I assumed a csv tool would default to ',' as the delimiter, but this defaults to ';'. This is mentioned in the terminal help, but not the readme, and the failure mode is an IndexError, which is confusing:
(Because csv_anonymizer assumes ';', and your csv uses ',', csv_anonymizer is likely to treat a row as a single large field, and thus fail if given a columnIndex > 0.)
In addition to mentioning --delimiter in the readme, perhaps a try/except around that error, with a user-friendly warning if len(row) == 1?