fedarko / strainFlye

Pipeline for analyzing (rare) mutations in metagenome-assembled genomes
BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

Speed up filtering scripts (OSA, PM) if we detect "empty" sequences #11

Closed fedarko closed 2 years ago

fedarko commented 2 years ago

i.e. if there are no reads at all aligned to these sequences, ignore them later on in the filter.

I'm not sure how much time this would save, but there are a surprisingly large amount of these contigs in the SheepGut dataset (16,659 out of 78,793 contigs, by my count), so a small optimization might actually be nice. Not sure how long calling bf.fetch(seq) on an "empty" seq takes.