PDB-REDO / alphafill

AlphaFill is an algorithm based on sequence and structure similarity that “transplants” missing compounds to the AlphaFold models. By adding the molecular context to the protein structures, the models can be more easily appreciated in terms of function and structure integrity.
https://alphafill.eu
BSD 2-Clause "Simplified" License
89 stars 16 forks source link

Questions about the "selection of chemical compounds" #44

Closed leaves520 closed 5 months ago

leaves520 commented 5 months ago

Hi, the work is meaningful, and I learn a lot from it. However, I got cofused when reading sentence ("All ligands covering about 95% of the cumulative occurrence of all ligands in the PDB were in the initial AlphaFill compound list") from paper. So, what the meaning of "95% of the cumulative occurrence"? Look forward to your reply. Thanks !

drlemmus commented 5 months ago

We made a sorted list of all compounds in the PDB and from the most common ones we took everything that seemed relevant (e.g. no glycerol). Let's say that something like ATP is so common that it represents 0.5 % of all ligands you see in the PDB, but some compounds you only find once in the whole PDB. Then you can imagine that the top 1000 compounds already cover more than 50%. We just kept expanding the list to get 95% coverage of all the ligands of the PDB.

leaves520 commented 5 months ago

First of all, thank you for your timely and detailed reply. Thanks! All ligand occurrences in the PDB are counted and sorted, and then select the top ligand to cover 95% of all the ligands of PDB ?Is there a problem with my understanding of this ?

drlemmus commented 5 months ago

No problem. This is exactly what we did.