BenevolentAI / DeeplyTough

DeeplyTough: Learning Structural Comparison of Protein Binding Sites
Other
157 stars 37 forks source link

Assertion Error : No htmd could be found #5

Closed gayatripanda5 closed 4 years ago

gayatripanda5 commented 4 years ago

I was using DeeplyTough for a user-defined data set , I followed the steps mentioned in "Custom Dataset" of your article and 1.Added the path for STRUCTURE_DATA_DIR environment variable in bashrc file. For testing purposes, I took one pair of PDB structures, their pockets in .pdb format and a csv file for their pairing. I kept all of this in datasets/custom directory. 2.Executed "python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --db_preprocessing 1 --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar"

I am getting the following warning and error:

2020-08-11 11:42:54,118 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough master/datasets/processed/htmd/custom/ind_pdbs/6I83_clean.npz,corresponding pdb likely could not be parsed

AssertionError: No HTMD could be found but 11PDB files were given, please call preprocess_once() on the dataset.

Can you suggest me where am I going wrong and what can I do rectify this error?

mys007 commented 4 years ago

Thanks for your interest in DeeplyTough. Have you checked out https://github.com/BenevolentAI/DeeplyTough/issues/1 ?

gayatripanda5 commented 4 years ago

Thanks for your reply. I checked this #1 . It is similar to my case , no .npz files were found in this directory.

I executed the command python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --db_preprocessing 1 --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar

It gave these warnings 11it [00:00, 1106.86it/s] 2020-08-11 13:57:56,381 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/ind_pdbs/6I83_clean.npz,corresponding pdb likely could not be parsed 2020-08-11 13:57:56,381 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/ind_pdbs/3GC9_clean.npz,corresponding pdb likely could not be parsed ............... Along with this error AssertionError: No HTMD could be found but 11PDB files were given, please call preprocess_once() on the dataset

No .npz files were formed. image

mys007 commented 4 years ago

Thanks for the details. However, in your screenshot I see no files or directories, i.e. no .pdb files in the ind_pdbs directory? Could you perhaps verify that the toy custom dataset distributed in this repository (https://github.com/BenevolentAI/DeeplyTough/tree/master/datasets/custom) works at your place? And then, could you follow the structure of this toy repository for your dataset?

gayatripanda5 commented 4 years ago

Thank for your reply. After running this command on your toy dataset:

python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar

I am getting this same warning and error: 8it [00:00, 608.95it/s] 2020-08-12 17:31:43,328 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/1a05B/1a05B.npz,corresponding pdb likely could not be parsed

AssertionError: No HTMD could be found but 8PDB files were given, please call preprocess_once() on the dataset.

Now, coming to my dataset, I kept all my files(.pdb) in this directory (/dataset/custom/ind_pdbs) All _out files were created by by your script, so this is clear that it has processed these files, but then gave this error at last. image

mys007 commented 4 years ago

Thanks. Maybe I misinterpret your screenshots but it seems to me that the incorporation of your dataset within datasets directory has somewhat corrupted it. Could you perhaps:

  1. Remove the datasets directory completely and revert it to the state as in this github repository.
  2. Run python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar. This should really succeed while printing out a lot of outputs, including messages like "Pre-processing xxxx with HTMD...".

If it's OK, continue:

  1. Create new directory datasets/your_dataset and put your pds as well as modified .csv file there.
  2. Run python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'your_dataset' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar. This should print out a lot of outputs, including messages like "Pre-processing xxxx with HTMD...".
  3. If it doesn't work and you have the permission to do so, could you perhaps upload here the content of 'your_dataset'?
gayatripanda5 commented 4 years ago

Thanks a lot. I apologize for bugging you again . I followed what you said for your toy set , it gave few warnings *** Open Babel Warning in parseAtomRecord WARNING: Problems reading a PDB file Problems reading a HETATM or ATOM record. According to the PDB specification, columns 77-78 should contain the element symbol of an atom. but OpenBabel found ' ' (atom 2692) 1 molecule converted Traceback (most recent call last): File "/home/iiitd/miniconda3/envs/deeplytough_mgltools/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_receptor4.py", line 10, in import MolKit.molecule File "/home/iiitd/miniconda3/envs/deeplytough_mgltools/MGLToolsPckgs/MolKit/molecule.py", line 23, in from mglutil.util import misc File "/home/iiitd/miniconda3/envs/deeplytough_mgltools/MGLToolsPckgs/mglutil/util/misc.py", line 17, in import numpy File "/home/iiitd/.local/lib/python2.7/site-packages/numpy/init.py", line 142, in from . import core File "/home/iiitd/.local/lib/python2.7/site-packages/numpy/core/init.py", line 71, in raise ImportError(msg) ImportError:

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the multiarray numpy extension module failed. Most likely you are trying to import a failed build of numpy. Here is how to proceed:

Original error was: /home/iiitd/.local/lib/python2.7/site-packages/numpy/core/_multiarray_umath.so: undefined symbol: PyUnicodeUCS4_FromObject

Then ended with the same error

2020-08-13 22:03:57,567 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/datasets/processed/htmd/custom/1a9t/1a9t_clean.npz,corresponding pdb likely could not be parsed Traceback (most recent call last): File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/scripts/custom_evaluation.py", line 69, in main() File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/scripts/custom_evaluation.py", line 41, in main entries = matcher.precompute_descriptors(entries) File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/matchers/deeply_tough.py", line 46, in precompute_descriptors feats = load_and_precompute_point_feats(self.model, self.args, pdb_list, point_list, self.device, self.nworkers, self.batch_size) File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/engine/predictor.py", line 37, in load_and_precompute_point_feats dataset = PointOfInterestVoxelizedDataset(pdb_list, point_list, box_size=args.patch_size) File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/engine/datasets.py", line 220, in init super().init(pdb_list, box_size=box_size, augm_rot=False, augm_mirror_prob=0) File "/home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough-master/deeplytough/engine/datasets.py", line 46, in init assert len(self.pdb_list) > 0, f'No HTMD could be found but {len(pdb_list)}' \ AssertionError: No HTMD could be found but 8PDB files were given, please call preprocess_once() on the dataset

mys007 commented 4 years ago

Thanks, that's very helpful, the problem apparently is that MGLTools crashes. I would suggest two next steps: 1) Could you post here your $PYTHONPATH and $PATH, please? I'm a bit suspicious about the path in the stack trace containing a non-conda directory: "/home/iiitd/.local/lib/python2.7/site-packages/numpy/init.py",

2) The problem might be due to a new version of mgltools, which we haven't fixed. Could you perhaps run the following conda commands, then again delete datasets/processed directory and run the custom_evaluation.py command?

conda remove --name deeplytough_mgltools --all
conda create -y -n deeplytough_mgltools python=2.7
conda install -y -n deeplytough_mgltools -c bioconda mgltools=1.5.6
gayatripanda5 commented 4 years ago

Please accept my apologies for this delayed response.

  1. The paths are export PYTHONPATH=$DEEPLYTOUGH/deeplytough:$PYTHONPATH export PATH=$DEEPLYTOUGH/fpocket2/bin:$PATH

  2. I followed the steps suggested by you. Now , everything seems fine.I ran this command "python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar" for your toy dataset and my dataset too. It ran successfully. Big thanks to you.

image

mys007 commented 4 years ago

That's terrific, I'm glad it works now, thanks for reporting the issue! I will fix the version of mgltools in the repository.

gayatripanda5 commented 4 years ago

Thanks a lot for your help.

On Wed, Aug 19, 2020, 16:22 Martin Simonovsky notifications@github.com wrote:

That's terrific, I'm glad it works now, thanks for reporting the issue! I will fix the version of mgltools in the repository.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/BenevolentAI/DeeplyTough/issues/5#issuecomment-676138088, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSH3U42LD57XV3BGYRZAKDSBOVH7ANCNFSM4P2V7O6A .

gayatripanda5 commented 4 years ago

Dear Sir, I am grateful for your help in this matter. I needed your help in understanding what could be a threshold value for the Pocket-Similarity score, above which we could say that two pockets have a better similarity than others. For my set of inputs, I got this result [image: image.png]

Can you assist me to understand these results? I apologize for bothering you so many times, just wanted your help in analyzing the results. Thanks in advance.

Regards Gayatri Panda

On Wed, Aug 19, 2020 at 4:23 PM Gayatri Panda gayatrip@iiitd.ac.in wrote:

Thanks a lot for your help.

On Wed, Aug 19, 2020, 16:22 Martin Simonovsky notifications@github.com wrote:

That's terrific, I'm glad it works now, thanks for reporting the issue! I will fix the version of mgltools in the repository.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/BenevolentAI/DeeplyTough/issues/5#issuecomment-676138088, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSH3U42LD57XV3BGYRZAKDSBOVH7ANCNFSM4P2V7O6A .

-- Gayatri Panda PhD19206 (Computational Biology)

mys007 commented 4 years ago

Hi, unfortunately the image has not been inserted well, can you try to edit your comment? In general, the similarity score allows for comparing whether two pockets are more similar that other pairs (the score is simply higher). Choosing a particular threshold is not well defined, for a larger dataset you would perhaps just plot ROC curve and decide on an operating point as a balance between true and false positive rates.

gayatripanda5 commented 4 years ago

Thanks a lot for your reply. The image focussing only on the scores is attached below. So, can we say for now , that more negative pocket similarity score means more similar??

On Sat, Sep 5, 2020, 02:35 Martin Simonovsky notifications@github.com wrote:

Hi, unfortunately the image has not been inserted well, can you try to edit your comment? In general, the similarity score allows for comparing whether two pockets are more similar that other pairs (the score is simply higher). Choosing a particular threshold is not well defined, for a larger dataset you would perhaps just plot ROC curve and decide on an operating point as a balance between true and false positive rates.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/BenevolentAI/DeeplyTough/issues/5#issuecomment-687384061, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSH3U4ZM6GUDILQEIGPPITSEFJBHANCNFSM4P2V7O6A .

mys007 commented 4 years ago

I'm still unable to see the image. But scores are defined as negative distances, so the more negative, the less similar.

gayatripanda5 commented 4 years ago

Sorry to be troublesome. I am grateful to you for your quick and elaborate responses. I couldn't figure out the issue with the image, anyways I am attaching it below. However, I now have some idea of how to analyze the scores. Thanks a ton.

On Sat, Sep 5, 2020 at 3:16 AM Martin Simonovsky notifications@github.com wrote:

I'm still unable to see the image. But scores are defined as negative distances, so the more negative, the less similar.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/BenevolentAI/DeeplyTough/issues/5#issuecomment-687405350, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSH3U6LE6TF6K4FHJQLBZTSEFNZTANCNFSM4P2V7O6A .

-- Gayatri Panda PhD19206 (Computational Biology)