cvigilv / ResidueFisher

Bioinformatics protocol that aims at mining information at the sequence and structure level of protein chain to detect possible evolutionary conserved residues.
MIT License
5 stars 1 forks source link

feat: change from MSA to DBSCAN for representative selection of Foldseek hits #16

Open cvigilv opened 1 year ago

cvigilv commented 1 year ago

As of v1.0, we are extracting representative structures from the NJ tree constructed from the MSA, resulting in a poor exploration of comparison due to limits in this procedure.

One idea is to assign a class to each hit based in 3 parameters:

Here we could generate zones with the notions of "twilight zone" and extract clusters with DBSCAN from which representatives are selected.