PDB-REDO / alphafill

AlphaFill is an algorithm based on sequence and structure similarity that “transplants” missing compounds to the AlphaFold models. By adding the molecular context to the protein structures, the models can be more easily appreciated in terms of function and structure integrity.
https://alphafill.eu
BSD 2-Clause "Simplified" License
89 stars 16 forks source link

Use foldseek to find structures? #8

Closed hmms117 closed 1 year ago

hmms117 commented 2 years ago

First of all, thanks for releasing a great tool. This seems to be useful not only for Alphafold structures or homology models, but also for transferring ligands across PDB structures - very neat.

I have a suggestion for an improvement. For identifying homolog structures, you use sequenced based BLAST. However, the Steinegger lab has released an excellent -and very fast tool- Foldseek that can do structural searches. It will provide both more accurate results (safer transfers), and speed up the search for homologs. Moreover, it is fairly simple to run just like BLAST. For details, see https://github.com/steineggerlab/foldseek.

Again, thanks for making the tool generally available - and the results through a web site.

drlemmus commented 2 years ago

That sounds interesting. It would be really cool if FoldSeek can search the PDB-REDO databank. Is there something we can do from our side to make that easier? A practical questing that has to do with scaling: What is the typical running time for Foldseek over the PDB? For a tool like AlphaFill (or rather for the databank), speed matters as we have to do things a million times.

hmms117 commented 2 years ago

If you have all PDB files with ligands in a directory, making a foldseek database is one command line. Or you can use their pre-built PDB database. The readme on their github nicely explains the few commands you need to run.

For search, it is -surprisingly- even faster than BLAST, so replacing BLAST with Foldseek will speed up the flow. The output looks like BLASTs m8/tab format, so blast/foldseek choice can even be an option(?). The only challenge is to find a sensible cutoff. Foldseek is far more sensitive than BLAST so you would need to set a very low evalue cut off. We can ask the foldseek guys for guidance.

mhekkel commented 1 year ago

might have a look at foldseek, if I can find the time.