Zac-HD / shed

`shed` canonicalises Python code. Shed your legacy, stop bikeshedding, and move on. Black++
https://pypi.org/project/shed/
GNU Affero General Public License v3.0
343 stars 23 forks source link

Complicated regex somehow breaks shed --refactor #92

Closed DRMacIver closed 1 year ago

DRMacIver commented 1 year ago

Haven't looked into what's going on beyond to minimize the bug, but the following code:

import re

def tokenize_csv(csv_string: str, delimiter: str = ",", quote_char: str = '"') -> list:
    token_strings = re.split(fr"\{delimiter}(?=(?:(?:\{quote_char})*[^\\{quote_char}]))?", csv_string)
    tokens = [re.sub(fr"{quote_char}(?=(?:(?:\{quote_char})*[^\\{quote_char}]))|\\", "", token) for token in token_strings]
    return tokens

Results in the following error:

Internal error formatting 'shedbug.py': ParserSyntaxError: Syntax Error @ 1:1.
tokenizer error: unterminated string literal

import re
^
    Please report this to https://github.com/Zac-HD/shed/issues
Zac-HD commented 1 year ago

Reproduced with libcst andrf"\\", so moved upstream to https://github.com/Instagram/LibCST/issues/917.

For local mitigation let's handle this as suggested in #93; we can have shed explicitly check if it's a libcst bug and report that if so.