biocore / metagenomics_pooling_notebook

Jupyter notebooks to assist with sample processing
MIT License
8 stars 16 forks source link

merge_read_counts() function should have a robust regex expression for matching sample_names to filenames. #222

Open RodolfoSalido opened 3 months ago

RodolfoSalido commented 3 months ago

https://github.com/biocore/metagenomics_pooling_notebook/blob/d753713f6520ca8a5688ab990e7868030796b81a/metapool/metapool.py#L1313

Per Charlie's observation:

It might be good to substitute r"^(.*)S\d+L\d\d\d(R\d)\d\d\d.trimmed.fastq.gz$" or similar expression for the one you're currently using. In the past there have been some fairly creative sample_names/sample_ids that lead to expressions like this one matching on the wrong part of the string and extracting the wrong value. This version ensures that you're matching against the part of the filename that we know has a fixed pattern.