alan-turing-institute / CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
https://clevercsv.readthedocs.io
MIT License
1.24k stars 70 forks source link

Fix invalid escape sequences in `clevercsv.detect_type` on newer Python versions #90

Closed JakobGM closed 1 year ago

JakobGM commented 1 year ago

Hi there!

New Python versions will fail if you construct a string with invalid escape sequences such as "\.". This PR consistently prefixes regex pattern strings in clevercsv.detect_type with the r"" prefix in order to instruct Python to parse the strings as raw strings. I could have just done it for the subset of strings that have invalid escape sequences, but since it is most of them, I did it for all. I suggest to just use it consistently for all regex strings such that you don't have to think about it.

Here is an example traceback for an uncaught exception that this PR circumvents:

  File "/__w/xxx/xxx/.venv/lib/python3.11/site-packages/clevercsv/detect_type.py", line 88
    "number_1": "^(?=[+-\.\d])[+-]?(?:0|[1-9]\d*)?(((?P<dot>((?<=\d)\.|\.(?=\d)))?(?(dot)(?P<yes_dot>\d*(\d*[eE][+-]?\d+)?)|(?P<no_dot>((?<=\d)[eE][+-]?\d+)?)))|((?P<comma>,)?(?(comma)(?P<yes_comma>\d+(\d+[eE][+-]?\d+)?)|(?P<no_comma>((?<=\d)[eE][+-]?\d+)?))))$",
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: invalid escape sequence '\.'
Error: Process completed with exit code 1.

Thanks in advance,

GjjvdBurg commented 1 year ago

Thanks Jakob!