biocore / mg-scripts

Knight Lab internal Metagenomic processing scripts for demultiplexing, QC and host removal
BSD 3-Clause "New" or "Revised" License
1 stars 5 forks source link

Fixes #106 #117

Closed charles-cowart closed 7 months ago

charles-cowart commented 8 months ago

Addresses https://github.com/biocore/mg-scripts/issues/106

coveralls commented 8 months ago

Pull Request Test Coverage Report for Build 7562474734


Totals Coverage Status
Change from base Build 7430826706: 0.04%
Covered Lines: 2141
Relevant Lines: 2455

💛 - Coveralls
charles-cowart commented 8 months ago

@wasade I'm hesitant to change how we determine the type and ID since everyone seems to remember this method and it does appear durable. However I can start pulling examples of xml files from across the runs to see if we can pull it reliably from there if you guys think it's worth it. We don't really have a tool to parse these and that may be a good thing to have yes?

wasade commented 8 months ago

Some googling suggests the instrument lookup may not be universally correct, and presumably Illumina or IGM has authoritative information. For regex, doesn't this work?

>>> import re
>>> matcher = re.compile(r'(\d{6,8})_([A-Z0-9]+)_(\d+)_([A-Z0-9]+)')
>>> matcher.search('231201_A01535_0431_BHVKWCDSX7').groups()
('231201', 'A01535', '0431', 'BHVKWCDSX7')
charles-cowart commented 7 months ago

Superceded by https://github.com/biocore/mg-scripts/pull/123