forlilab / Ringtail

Package for storage and analysis of virtual screenings run with AutoDock-GPU and AutoDock Vina
GNU Lesser General Public License v2.1
40 stars 7 forks source link

Problem with a union of interaction combinations (`max_miss`>0): bookmark_name for `write_molecule_sdfs`, ligand counting, and energies in result output #41

Closed rwxayheee closed 3 months ago

rwxayheee commented 5 months ago

First of all thanks everyone for the nice work. I had some personal difficulties trying to export filtered molecules that came from a union of interaction filters (max_miss>0) :"( Here's a sample script I run Ringtail with:

from ringtail import RingtailCore
import sys, random, glob

# lazy overwrite protect in testing
i = random.getrandbits(16)
# Instantiating Ringtail object
rtc = RingtailCore("r{}-output.db".format(i))

# specify a portion of dlg
flist = list(glob.glob("/Users/amyhe/Desktop/2_Ringtail/Rigid-ADG/rec_i1__*.dlg"))
# write dlg paths to file_list.txt
with open(f"r{i}-file_list.txt", "w") as f:
    for fname in flist:
        f.write(fname+"\n")

# add results from file_list
rtc.add_results_from_files(file_list=f"r{i}-file_list.txt", store_all_poses=True)#,\
                           #receptor_file="/Users/amyhe/Desktop/2_Ringtail/Ext_Rec/Rigid/h01_r0/NC2_CDP/receptor.pdbqt")

# direct summary to summary_only.txt
orig_stdout= sys.stdout
fo= open(f"r{i}-summary_only.txt", "w")
sys.stdout = fo
rtc.produce_summary()
sys.stdout = orig_stdout
fo.close()

# filter and make a bookmark
# direct filter output to filter_output.txt
rtc.set_output_options(f"r{i}-filter_output.txt")
rtc.filter(hb_interactions=[("A:PHE:350:", True), ("A:GLU:568:", True)],
           vdw_interactions=[("A:PHE:350:", True), ("A:GLU:568:", True)],
           # should include all poses
           max_miss=4,
           bookmark_name="F350_E568")

There are several things I'm a bit unsure about:

(1) Bookmark Name At this point,

rtc.get_bookmark_names()

produces

['F350_E568_0',
 'F350_E568_1',
 'F350_E568_2',
...
 'F350_E568_14',
 'F350_E568_15',
 'F350_E568_union']

This is understandable behavior, although it's not the original bookmark name. But after that when I attempted towrite_molecule_sdfs, it seemed like the bookmarks couldn't be accessed (?).

rtc.write_molecule_sdfs(bookmark_name='F350_E568_0', sdf_path='union')

Gave

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[3], [line 1](vscode-notebook-cell:?execution_count=3&line=1)
----> [1](vscode-notebook-cell:?execution_count=3&line=1) rtc.write_molecule_sdfs(bookmark_name='F350_E568_0', sdf_path='union')

File ~/Downloads/Ringtail/ringtail/ringtailcore.py:1436, in RingtailCore.write_molecule_sdfs(self, sdf_path, bookmark_name, write_nonpassing)
   [1430](https://file+.vscode-resource.vscode-cdn.net/Users/amyhe/Desktop/2_Ringtail/~/Downloads/Ringtail/ringtail/ringtailcore.py:1430)     self.set_output_options(export_sdf_path=sdf_path)
   [1432](https://file+.vscode-resource.vscode-cdn.net/Users/amyhe/Desktop/2_Ringtail/~/Downloads/Ringtail/ringtail/ringtailcore.py:1432) all_mols = self.ligands_rdkit_mol(
   [1433](https://file+.vscode-resource.vscode-cdn.net/Users/amyhe/Desktop/2_Ringtail/~/Downloads/Ringtail/ringtail/ringtailcore.py:1433)     bookmark_name=bookmark_name, write_nonpassing=write_nonpassing
   [1434](https://file+.vscode-resource.vscode-cdn.net/Users/amyhe/Desktop/2_Ringtail/~/Downloads/Ringtail/ringtail/ringtailcore.py:1434) )
-> [1436](https://file+.vscode-resource.vscode-cdn.net/Users/amyhe/Desktop/2_Ringtail/~/Downloads/Ringtail/ringtail/ringtailcore.py:1436) for ligname, info in all_mols.items():
   [1437](https://file+.vscode-resource.vscode-cdn.net/Users/amyhe/Desktop/2_Ringtail/~/Downloads/Ringtail/ringtail/ringtailcore.py:1437)     self.logger.info("Writing " + ligname + ".sdf")
   [1438](https://file+.vscode-resource.vscode-cdn.net/Users/amyhe/Desktop/2_Ringtail/~/Downloads/Ringtail/ringtail/ringtailcore.py:1438)     self.outputman.write_out_mol(
   [1439](https://file+.vscode-resource.vscode-cdn.net/Users/amyhe/Desktop/2_Ringtail/~/Downloads/Ringtail/ringtail/ringtailcore.py:1439)         ligname, info["ligand"], info["flex_residues"], info["properties"]
   [1440](https://file+.vscode-resource.vscode-cdn.net/Users/amyhe/Desktop/2_Ringtail/~/Downloads/Ringtail/ringtail/ringtailcore.py:1440)     )

AttributeError: 'NoneType' object has no attribute 'items'

(2) Ligand Counting

With max_miss>0, ligands may be doubly counted after the union is made. Number passing ligands in max_miss union may be greater than the number of input files. Not sure if they should be merged (by best energies?) or kept as unique poses...

(3) Energies in Result Output

Filters:
***************

---------------
Max Miss Union:

Result bookmark name: F350_E568_union
***************
'Val_PDBQT__CB87950354_1_s1', -7.19
'Val_PDBQT__CB87950354_1_s2', -6.6
'Val_PDBQT__CB9021524_s1', -8.09
'Val_PDBQT__ZINC00028011_s1', -5.21
'Val_PDBQT__ZINC00028011_s3', -6.23
'Val_PDBQT__ZINC00107384_s1', -7.27
'Val_PDBQT__ZINC00107384_s1', -7.07
'Val_PDBQT__ZINC00121280_1_s1', -8.11
'Val_PDBQT__ZINC00121280_1_s1', -8.02
'Val_PDBQT__ZINC00121280_1_s1', -7.62
'Val_PDBQT__ZINC00121280_1_s2', -8.0
'Val_PDBQT__ZINC00174720_s1', -6.16
'Val_PDBQT__ZINC00174720_s2', -6.01
'Val_PDBQT__ZINC00174720_s3', -6.21
'Val_PDBQT__ZINC00174720_s3', -5.94
'Val_PDBQT__ZINC00246690_s1', -6.39
'Val_PDBQT__ZINC00262670_s1', -6.86
...

Although max_miss=4 should include all poses provided, the energy ligand in the results may not be the best Free Energy of Binding observed. For example for 'Val_PDBQT__ZINC00174720_s3', the best in DLG is -6.34. Is this expected? Or should we interpret these filtered results in a different way (they are no longer under a unique ligand name, but belong to a cluster that's defined by a certain interaction pattern)?

maylinnp commented 4 months ago

Issue 1) is fixed in this commit : 06cfd17

maylinnp commented 4 months ago

Comment about issue 1) When filtering with the option max_miss a new bookmark, bookmarkname+_n are made, and one bookmarkname+_union, so the name of the bookmarks are expected. I did fix a bug in how bookmark names are handled while writing SDFs, and I also wrote better error messages.

Issue 2) I do believe the number to compare to is number of poses/results, and not ligands/input files. There might be some scientific discussion there we should take as a group, regarding ligand vs pose. Or am I misunderstanding your comment?

Issue 3) looking into it/might need to be part of scientific discussion mentioned above

rwxayheee commented 4 months ago

Hi @maylinnp Thanks for all the work done. I really appreciate that. Yes I don't think (2) or (3) is a bug. It's more like a changed behavior because of union of interaction combinations. When ligands are selected by discrete types of interactions, only the best pose and at most one entry per ligand will be returned. When the selection is a union, the returned score may not be the best (3), and multiple entries per ligand can be returned (2). It might have something to do with how the union is computed. But if it's by design, then there's no issue with that