bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
86 stars 17 forks source link

What to expect for a sample that may have a mixture of two or more strains #281

Closed ilyavs closed 10 months ago

ilyavs commented 10 months ago

Hi, This is more of a general question. Suppose I have a fastq file of a sample that has two or more strains of the same species, a contaminated sample. What could I expect from the assignment pipeline? Can it be used to detect the contamination? Thanks!

johnlees commented 10 months ago

We usually remove these samples. My expectation would be that it either joins these two clusters together, or forms a totally new cluster – depending on the nature of the contamination and the specific model fit used. I would look at this with some data (real or simulated), and in particular look at the core/accessory distances estimated by sketchlib to see if they are obviously outlying in a detectable/repeatable way

ilyavs commented 10 months ago

Thank you