Bioinformatics protocol that aims at mining information at the sequence and structure level of protein chain to detect possible evolutionary conserved residues.
MIT License
5
stars
1
forks
source link
feat: change from MSA to DBSCAN for representative selection of Foldseek hits #16
As of v1.0, we are extracting representative structures from the NJ tree constructed from the MSA, resulting in a poor exploration of comparison due to limits in this procedure.
One idea is to assign a class to each hit based in 3 parameters:
Alignment length
TM-score
Identity percentage
Here we could generate zones with the notions of "twilight zone" and extract clusters with DBSCAN from which representatives are selected.
As of v1.0, we are extracting representative structures from the NJ tree constructed from the MSA, resulting in a poor exploration of comparison due to limits in this procedure.
One idea is to assign a class to each hit based in 3 parameters:
Here we could generate zones with the notions of "twilight zone" and extract clusters with DBSCAN from which representatives are selected.