Distances of reads - Githubissues

avierstr / amplicon_sorter

Sorts amplicons from Nanopore sequencing data based on similarity

32 stars 8 forks source link

Distances of reads #11

Open omarkr8 opened 1 year ago

omarkr8 commented 1 year ago

I was wondering if the analysis uses a distance matrix for each cluster, or is that just the initial steps and thereafter every comparison is consensus based?

I ask because Ive been meaning to compare consensus and centroid of clusters. perhaps generate a distance matrix for the final groups and select centroids as representative sequence based on the read with the lowest distances.

would like to hear thoughts on this.

avierstr commented 1 year ago

It is only making a (partial) matrix in the initial step to make the clusters. Thereafter all the remaining reads are compared with the consensus of each cluster to find to which consensus they have the highest similarity.

Amplicon_sorter is not looking for centroids in the cluster, it should be easy if you compare all the reads in the cluster with the consensus. You will get similarities ranging from 85% - 100%.