Open danpolanco opened 5 months ago
I believe the correct way to do this in newer versions of Python is with raw strings:
re.findall(r'CO-CDPHE-([0-9a-zA-Z_\-\.]+)', fasta_header)
The change is minor as a raw string is just denoted by adding an r
to the front of a string (i.e. r"string"
). I'm not sure this is the correct change and welcome discussion / more research.
Also see The Backslash Plague.
Describe the bug
The findall regex either changed in newer versions of Python or this has always been incorrect:
https://github.com/CDPHE-bioinformatics/CDPHE-SARS-CoV-2/blob/f3b93dd3972b1378329810fa4a81d87a26afcdfa/scripts/concat_seq_metrics_and_lineage_results.py#L123
To Reproduce
See image in screenshots section but briefly:
fasta_header
= CDPHE-CO-123456789-0`.Expected behavior
No errors to be issued by the Python interpreter.
Screenshots
Additional context
N/A