bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
87 stars 17 forks source link

Fit a dbscan model with 7 Clusters But visualise's result has more than 7 Clusters #220

Closed shaodongyan closed 1 year ago

shaodongyan commented 1 year ago

Versions PopPUNK 2.5.0 Command used and output returned

poppunk --fit-model dbscan --ref-db ourdatabase --threads 56 --min-cluster-prop 0.0005 poppunk_visualise --ref-db ourdatabase --output example_viz --microreact

Describe the bug

with "poppunk --fit-model dbscan --ref-db ourdatabase --threads 56 --min-cluster-prop 0.0005" We have a good result. ourdatabase_dbscan

But after "poppunk_visualise --ref-db ourdatabase --output example_viz --microreact", It has the cluster numbers that is same with number of my isolates. image

Thanks, I need your help. I want to know how can we devide our isolates into 7 clusters

nickjcroucher commented 1 year ago

The "7" refers to the number of clusters identified in the distribution of pairwise distances, not the number of clusters in the population. Your dataset looks very diverse, which is why no isolates are clustered together. You might want to try lineage mode clustering instead (--fit-model lineage).

johnlees commented 1 year ago

The DBSCAN model isn't very good, it's identified a tiny cluster (in purple) at the origin. You probably want to run network refinement on your model as you want a boundary between the DBSCAN clusters. See https://poppunk.readthedocs.io/en/latest/model_fitting.html#refine

I want to know how can we devide our isolates into 7 clusters

PopPUNK does not allow you to specify the number of clusters to divide the dataset into. You'd probably want to try hierarchical clustering or similar for that.