PDB-REDO / alphafill

AlphaFill is an algorithm based on sequence and structure similarity that “transplants” missing compounds to the AlphaFold models. By adding the molecular context to the protein structures, the models can be more easily appreciated in terms of function and structure integrity.
https://alphafill.eu
BSD 2-Clause "Simplified" License
89 stars 16 forks source link

[Suggestion] An easy implementation of structure similarity search using the DASH database #38

Closed sadiogo closed 11 months ago

sadiogo commented 11 months ago

DASH (Database of Aligned Structural Homologs) is a database of structural alignments for all known structurally homologous protein domains and chains in the PDB.

They provide a fasta database for protein domains in the PDB, wherein all domains are part of a structural cluster. This information is present in the fasta header, so all that needs to be done is parse this information once the blast algorithm returns a hit. In this case, every hit will be linked to a cluster, therefore all proteins from that cluster are suitable for extracting ligands.

The only caveat is that they haven't updated the database since 2020, but I'm sure they would be willing to collaborate with alphafill.

You can download the domain fasta database here. There is also plenty of information on how to use their API.

drlemmus commented 11 months ago

There are more modern alternatives for this that we prioritise.

sadiogo commented 11 months ago

Totally understandable, I like Foldseek (which was also mentioned by another user). Just thought DASH would be an easy fast alternative to implement while better ones are still in development.