AnantharamanLab / vRhyme

Binning Virus Genomes from Metagenomes
GNU General Public License v3.0
55 stars 9 forks source link

Issue on the clustering algorithm #14

Open dyxstat opened 1 year ago

dyxstat commented 1 year ago

Hi developers,

Thanks for developing this good and user-friendly software. I am curious about the clustering algorithm you used in vRhyme.

You mentioned 'Weighted networks, representing unrefined bins, are created where each node is a scaffold and each edge is a weighted connection between paired scaffolds. Networks are refined using MiniBatchKMeans implemented in Scikit-Learn' in the vRhyme paper. However, the input of KMeans is usually the feature vectors. I wonder whether you use some tricks, such as kernel, to generalize the KMeans algorithm?

I would appreciate it if you could explain more about how to refine networks using KMeans.

Thanks, Yancey

KrisKieft commented 1 year ago

In an update I moved from kmeans to label propagation for bin refinement. The downside of software publications is that it's a snapshot of a previous version. LP seemed to be more accurate and less based on estimated parameters. Let me know if you have questions on the update.