forlilab / Ringtail

Package for storage and analysis of virtual screenings run with AutoDock-GPU and AutoDock Vina
GNU Lesser General Public License v2.1
41 stars 7 forks source link

Cluster size not extracted from dlg #35

Closed alfredoq closed 6 months ago

alfredoq commented 6 months ago

Hello,

I am processing a .dlg file (attached) with ringtail v1.1.0 using the following Python code:

from ringtail import RingtailCore opts = RingtailCore.get_defaults() opts['rman_opts']['values']['file_sources']['file_path']['path'] = [['./dlgs']] opts['storage_opts']['values']['db_file'] = 'database.db'

with RingtailCore(opts_dict=opts) as rt_core: rt_core.add_results()

The database is correctly created, however the field 'cluster_size' in the Results table is empty. Do I need to explicitly perform a cluster analysis with some keyword? Should the cluster size be directly extracted from the clustering histogram present in the .dlg file?

Thanks in advance for the support

example.dlg.gz

diogomart commented 6 months ago

Hello,

I think it should be included automatically, me or @maylinnp will look into that.

Since you are a power user, you may want to try the api_dev branch. It is a refactor to improve scripting from Python. We are still testing it, but hope to release soon as v2.0.0. If you try it and have any feedback on that it will be greatly appreciated.

Thanks!

maylinnp commented 6 months ago

Hi @alfredoq , I just took a look at the issue you were having and I identified the problem area. I wrote your example file to a database using the new API, I also ran a different example file I have. Observations: No cluster size data is added for your file, but it is added for my other example file. When I looked into the two dlgs, I noticed that your file lacks some interaction data at the start of each "Final docked state" (please note I am not very familiar with AutoDock, so I am only making simple observations here).

Alfredo's example dlg:

` FINAL DOCKED STATE:


Run: 19 / 20 Time taken for this run: 0.033s

DOCKED: MODEL 19 DOCKED: USER Run = 19 DOCKED: USER DOCKED: USER Estimated Free Energy of Binding = -9.39 kcal/mol [=(1)+(2)+(3)-(4)] DOCKED: USER DOCKED: USER (1) Final Intermolecular Energy = -9.99 kcal/mol`

My example dlg:

` FINAL DOCKED STATE:


Run: 20 / 20 Time taken for this run: 0.013s

ANALYSIS: COUNT 48 ANALYSIS: TYPE { "H", "H", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V"} ANALYSIS: LIGID { 3 , 3 , 1 , 1 , 1 , 1 , 1 , 4 , 4 , 5 , 6 , 6 , 8 , 8 , 8 , 9 , 9 , 10 , 10 , 10 , 11 , 11 , 11 , 11 , 11 , 13 , 13 , 14 , 14 , 14 , 15 , 15 , 16 , 16 , 16 , 16 , 16 , 17 , 17 , 18 , 18 , 18 , 18 , 18 , 19 , 19 , 19 , 19 } ANALYSIS: LIGNAME { "H", "H", "C", "C", "C", "C", "C", "N", "N", "C", "C", "C", "C", "C", "C", "N", "N", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C"} ANALYSIS: RECID { 1530 , 1531 , 215 , 231 , 1605 , 211 , 216 , 1529 , 1598 , 231 , 1606 , 1598 , 231 , 1606 , 1605 , 1598 , 1611 , 231 , 1606 , 1605 , 163 , 1605 , 1606 , 162 , 1602 , 1611 , 1598 , 1605 , 211 , 1606 , 408 , 1606 , 1606 , 180 , 210 , 1605 , 211 , 408 , 1579 , 199 , 1606 , 201 , 1605 , 198 , 201 , 1606 , 408 , 1579 } ANALYSIS: RECNAME { "OD1", "OD2", "CA", "CG2", "CG", "C", "C", "CG", "CB", "CG2", "CD", "CB", "CG2", "CD", "CG", "CB", "CB", "CG2", "CD", "CG", "C", "CG", "CD", "CA", "C", "CB", "CB", "CG", "C", "CD", "CD2", "CD", "CD", "CA", "CA", "CG", "C", "CD2", "CG1", "C", "CD", "CB", "CG", "CA", "CB", "CD", "CD2", "CG1"} ANALYSIS: RESIDUE { "ASP", "ASP", "ASN", "VAL", "PRO", "GLY", "ASN", "ASP", "ALA", "VAL", "PRO", "ALA", "VAL", "PRO", "PRO", "ALA", "SER", "VAL", "PRO", "PRO", "GLY", "PRO", "PRO", "GLY", "PRO", "SER", "ALA", "PRO", "GLY", "PRO", "LEU", "PRO", "PRO", "GLY", "GLY", "PRO", "GLY", "LEU", "VAL", "PHE", "PRO", "PHE", "PRO", "PHE", "PHE", "PRO", "LEU", "VAL"} ANALYSIS: RESID { 274 , 274 , 146 , 147 , 282 , 145 , 146 , 274 , 281 , 147 , 282 , 281 , 147 , 282 , 282 , 281 , 283 , 147 , 282 , 282 , 140 , 282 , 282 , 140 , 282 , 283 , 281 , 282 , 145 , 282 , 164 , 282 , 282 , 142 , 145 , 282 , 145 , 164 , 279 , 144 , 282 , 144 , 282 , 144 , 144 , 282 , 164 , 279 } ANALYSIS: CHAIN { "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"}

DOCKED: MODEL 20 DOCKED: USER Run = 20 DOCKED: USER DOCKED: USER Estimated Free Energy of Binding = -6.29 kcal/mol [=(1)+(2)+(3)-(4)] DOCKED: USER DOCKED: USER (1) Final Intermolecular Energy = -6.89 kcal/mol`

In the Ringtail code, presence of interaction data is needed to write cluster size to the database. It looks like maybe Alfredo has docked without including some interactions, so his dlg lacks that information. Therefore, cluster information is not written to db.

@diogomart I do believe this is the issue Alfredo is experiencing. What I don't know is if this is expected behavior, or a bug?

alfredoq commented 6 months ago

Hello,

I think it should be included automatically, me or @maylinnp will look into that.

Since you are a power user, you may want to try the api_dev branch. It is a refactor to improve scripting from Python. We are still testing it, but hope to release soon as v2.0.0. If you try it and have any feedback on that it will be greatly appreciated.

Thanks!

Thank you @diogomart for the promt reply. I will take a look at the dev branch in order to test v2.0.0. I am indeed trying to use ringtail from my own Python workflow, so the improved features you mention are greatly appreciated.

I will provide the corresponding feedback in this respect,

kind regards

alfredoq commented 6 months ago

Hi @alfredoq , I just took a look at the issue you were having and I identified the problem area. I wrote your example file to a database using the new API, I also ran a different example file I have. Observations: No cluster size data is added for your file, but it is added for my other example file. When I looked into the two dlgs, I noticed that your file lacks some interaction data at the start of each "Final docked state" (please note I am not very familiar with AutoDock, so I am only making simple observations here).

Alfredo's example dlg:

` FINAL DOCKED STATE: ____

Run: 19 / 20 Time taken for this run: 0.033s

DOCKED: MODEL 19 DOCKED: USER Run = 19 DOCKED: USER DOCKED: USER Estimated Free Energy of Binding = -9.39 kcal/mol [=(1)+(2)+(3)-(4)] DOCKED: USER DOCKED: USER (1) Final Intermolecular Energy = -9.99 kcal/mol`

My example dlg:

` FINAL DOCKED STATE: ____

Run: 20 / 20 Time taken for this run: 0.013s

ANALYSIS: COUNT 48 ANALYSIS: TYPE { "H", "H", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V", "V"} ANALYSIS: LIGID { 3 , 3 , 1 , 1 , 1 , 1 , 1 , 4 , 4 , 5 , 6 , 6 , 8 , 8 , 8 , 9 , 9 , 10 , 10 , 10 , 11 , 11 , 11 , 11 , 11 , 13 , 13 , 14 , 14 , 14 , 15 , 15 , 16 , 16 , 16 , 16 , 16 , 17 , 17 , 18 , 18 , 18 , 18 , 18 , 19 , 19 , 19 , 19 } ANALYSIS: LIGNAME { "H", "H", "C", "C", "C", "C", "C", "N", "N", "C", "C", "C", "C", "C", "C", "N", "N", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C"} ANALYSIS: RECID { 1530 , 1531 , 215 , 231 , 1605 , 211 , 216 , 1529 , 1598 , 231 , 1606 , 1598 , 231 , 1606 , 1605 , 1598 , 1611 , 231 , 1606 , 1605 , 163 , 1605 , 1606 , 162 , 1602 , 1611 , 1598 , 1605 , 211 , 1606 , 408 , 1606 , 1606 , 180 , 210 , 1605 , 211 , 408 , 1579 , 199 , 1606 , 201 , 1605 , 198 , 201 , 1606 , 408 , 1579 } ANALYSIS: RECNAME { "OD1", "OD2", "CA", "CG2", "CG", "C", "C", "CG", "CB", "CG2", "CD", "CB", "CG2", "CD", "CG", "CB", "CB", "CG2", "CD", "CG", "C", "CG", "CD", "CA", "C", "CB", "CB", "CG", "C", "CD", "CD2", "CD", "CD", "CA", "CA", "CG", "C", "CD2", "CG1", "C", "CD", "CB", "CG", "CA", "CB", "CD", "CD2", "CG1"} ANALYSIS: RESIDUE { "ASP", "ASP", "ASN", "VAL", "PRO", "GLY", "ASN", "ASP", "ALA", "VAL", "PRO", "ALA", "VAL", "PRO", "PRO", "ALA", "SER", "VAL", "PRO", "PRO", "GLY", "PRO", "PRO", "GLY", "PRO", "SER", "ALA", "PRO", "GLY", "PRO", "LEU", "PRO", "PRO", "GLY", "GLY", "PRO", "GLY", "LEU", "VAL", "PHE", "PRO", "PHE", "PRO", "PHE", "PHE", "PRO", "LEU", "VAL"} ANALYSIS: RESID { 274 , 274 , 146 , 147 , 282 , 145 , 146 , 274 , 281 , 147 , 282 , 281 , 147 , 282 , 282 , 281 , 283 , 147 , 282 , 282 , 140 , 282 , 282 , 140 , 282 , 283 , 281 , 282 , 145 , 282 , 164 , 282 , 282 , 142 , 145 , 282 , 145 , 164 , 279 , 144 , 282 , 144 , 282 , 144 , 144 , 282 , 164 , 279 } ANALYSIS: CHAIN { "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"}

DOCKED: MODEL 20 DOCKED: USER Run = 20 DOCKED: USER DOCKED: USER Estimated Free Energy of Binding = -6.29 kcal/mol [=(1)+(2)+(3)-(4)] DOCKED: USER DOCKED: USER (1) Final Intermolecular Energy = -6.89 kcal/mol`

In the Ringtail code, presence of interaction data is needed to write cluster size to the database. It looks like maybe Alfredo has docked without including some interactions, so his dlg lacks that information. Therefore, cluster information is not written to db.

@diogomart I do believe this is the issue Alfredo is experiencing. What I don't know is if this is expected behavior, or a bug?

Thank you @maylinnp for analyzing this issue.

As you mention, in order to obtain the .dlg file I provided, I executed AutoDock without requesting an interaction contact analysis, that is why the computed interactions are not included within the .dlg file. The reason behind this configuration is that I perform a whole set of intermolecular interactions analyses some steps after the docking stage. However I am indeed interested in storing the cluster size information to perform population analysis.

So, it appears to me that the parsing of the .dlg file strongly relies on the presence of the interaction analysis section. Maybe the storing of cluster sizes can be implemented for more general scenarios?

Thank you again!

diogomart commented 6 months ago

@maylinnp yes it's a bug, the cluster info should always be written from DLG whether or not interactions are calculated.

@alfredoq thank you for giving the future v2.0.0 a try!

maylinnp commented 6 months ago

Hi @alfredoq , I pushed the bug fix both to main and to api_dev. If you want to try out the new API on api_dev, feel free to reach out to me at any point with questions or issues. We have documentation up on readthedocs that is still in preparation, but the API documentation is mostly up to date: https://ringtail.readthedocs.io/en/latest/api.html

alfredoq commented 6 months ago

Thank you very much @maylinnp for fixing this bug and suggesting the api_dev branch. I will try it and return the feedback in case I find any issue with it.

I am closing this issue since it is solved.

kind regards