bsmith89 / StrainFacts

Factorize metagenotypes to infer strains and their abundances
MIT License
11 stars 1 forks source link

Prevalence filtering question #12

Closed gavinmdouglas closed 11 months ago

gavinmdouglas commented 11 months ago

Hey Byron,

I hope all is well with you! Just a quick question -- in your filter_data.ipynb notebook you pre-processed the example data so that polymorphic sites found in < 5% of samples were excluded. Have you explored how robust StrainFacts is when this filtering step isn't performed? I'm actually interested in strains that might be sample-specific (and as long as they are at sufficient depth in that sample I'm not too worried about false positives).

Thanks!

Gavin

bsmith89 commented 11 months ago

Thanks for reaching out!

I have not explored that deeply, no, so can't comment about robustness directly, but I do have some intuition and related thoughts worth mentioning.

I hope these thoughts are useful (and I hope StrainFacts is useful, too)!

gavinmdouglas commented 11 months ago

Thanks Byron, that's very helpful! I'm going to see how things change with and without prevalence cut-offs for the data I'm using... I'll keep your advice in mind.

Cheers,

Gavin