bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
89 stars 18 forks source link

Can I apply PopPUNK on metagenome assembled genomes #116

Closed SilasK closed 3 years ago

SilasK commented 3 years ago

Hey, I tried to apply Poppunk on MAGs. As no one has mentioned it before I wanted to ask if there is a problem with that. I know that sketching is regularly used with MAGs.

On my data, I get almost for all comparisons accessory distance ~0.8 for core distances <0.1.

johnlees commented 3 years ago

I've not really thought about this before. I don't know much about metagenomics so excuse any misunderstandings! Do the MAGs being used as input potentially consist of mixed strains from the same taxon? Are you comparing MAGs from the same taxon?

The sketching approach is equally applicable, as you say. The main thing here would be whether you expect rare k-mers (error or otherwise) that need to be filtered out. There is a rare k-mer filter which is on by default for read data, but counts anything from an assembly.

It's harder for me to say with respect to the core/accessory distances. Do the MAGs you're comparing have a reasonable definition of core and accessory genome? If the MAGs you are comparing are distant, and have many SNPs in their core genome, you may need to decrease the minimum k-mer length to try and increase the resolution (though the lower limit here will depend on the length of the assemblies). Looking at --plot-fit can be useful in these cases to see if the relationship is the linear one expected by the model.