draperjames / qtpandas

Qt Meets Pandas
MIT License
140 stars 53 forks source link

Fix UnicodeError in superReadCSV(...) #47

Closed mherrmann closed 5 years ago

mherrmann commented 5 years ago

The CSVImportDialog sometimes produced a UnicodeDecodeError when using pandas 0.23.1. There were several reasons for this:

qtpandas assumed that Python's set(...) maintains the item order. But that's not the case. In particular, superReadCSV used a set to keep track of codecs to try when reading a file. By adding the first_codec parameter into this set first, qtpandas assumed it would then get that element back first when iterating over the set. This was not always the case in my tests (precisely because the set order is not guaranteed). It lead to superReadCSV not obeying the first_codec parameter, and sometimes trying other codecs before the requested one.

Next, in pandas 0.23.1, read_csv(...) can raise a UnicodeError. But qtpandas was only catching UnicodeDecodeError. I updated the code to make qtpandas handle the more general exception.