drewnutt / open_combind

Open-source docking pipeline leveraging pairwise statistics
Other
8 stars 1 forks source link

How to featurize docking poses from already docked result files to use Open-ComBind? #38

Open Sowmya-R-Krishnan opened 3 months ago

Sowmya-R-Krishnan commented 3 months ago

Dear team,

Thank you very much for providing Open-ComBind as a command-line tool for docking pose selection. I have results from a previous docking job using GNINA with CNN-scoring. I have 10 proteins (PDB files already prepared) and 10 docking poses for each ligand. I would like to use Open-ComBind to finalize the docking pose for further analysis. Based on exploring the help options available for each module of the tool, I realized that it follows a standard file path nomenclature like structure/proteins, structure/ligands etc. I tried using the featurize module with my docking result (sdf file) and it confirmed my fears - I am unable to figure out how to change the path names as per the nomenclature followed in Open-ComBind. Given that I have the following data in hand, can you kindly help me with the path and filename settings to be followed to run featurization and pose selection?

  1. PDB files of 10 proteins (already prepared for docking with GNINA).
  2. PDB files of crystal ligands separated from the co-crystal structures for grid box setting.
  3. Multi-SDF files for several ligands with 10 poses per file.

Also, while trying to rectify the error with the featurization step, I saw that in one of the codes (features/ifp.py), the protein filename has been defined/built as shown below:

prot_bname = input_file.split('-to-')[-1]
prot_fname = re.sub('-docked.*\.sdf(\.gz)?','_prot.pdb',prot_bname)
prot_file = f"structures/proteins/{prot_fname}"

Here, the input filename is expected to have a -to- phrase, the docking result file is not expected to have any preceding filepaths (since the next line uses structures/proteins/ as the hard-coded path to access the protein file, and the docking output file itself should be with the suffix -docked.sdf or -docked.sdf.gz. Is it possible to provide a detailed README or usage manual kind of file to understand these requirements beforehand and use Open-ComBind effectively? I think re-running all docking jobs through this pipeline again will not be possible for me. It will be great if there is a way to use the results directly here. Thank you for taking the time to read this and hoping to hear from the team soon.

Error from featurizer when a path was prefixed to the docking output filename Screenshot from 2024-04-04 14-59-18

With regards, Sowmya

drewnutt commented 2 months ago

I have added the ability to add additional keyword arguments during featurization (9493a730e3f39ccde6d977772751368d8ebbd040) that allow you to predefine the docking protein and protein file directory.

When the docking protein is defined, it no longer assumes anything about your docked file naming.

This can all be specified in the CLI or with the python API.

This basic implementation is limited to only 1 docking protein per featurization.

Let me know if this does not satisfy your constraints.