Open imerelli opened 11 months ago
Hello and thanks for showing interest in Deeprank-Mut. It's true that the tests should be run from the root and not from the test directory. Thanks for pointing this out. I'll see If I can change this in the documentation. If the tests do not show any output, that means that it ran fine. I see you got some warnings, because our code was made with an older version of numpy. I'll see if I can make those warnings go away.
Thank you for your help. If I have understood the workflow of how to use your tool correctly, I need three scripts: 1) one for creating the database with multiple variants that I already know to be pathogenic or benign, 2) one to train the network, and 3) one to make predictions with a novel, unknown variant. Is this correct? If yes, could you please provide a template for these three scripts? There is something about the first two (but with only one variant), but nothing for step 3. Sorry, I'm not a Python programmer and I need help with this.
To design a script, let me start by asking how your data is organized. Deeprank-mut requires:
amino acid to replace the residue by
Do you perhaps have a sample of your data table? That could help us design a preprocessing script.
I have all the requested information. Please find in attachment a zip archive with the pdb and the pssm files as well as a tentative script for the first two steps (inside there are the information about the known mutations, but I don't know how to proceed with more than one mutation). No idea on how to write the third script.
That's an odd looking PSSM. How does one read this? The PSSM format that deeprank works with looks like this:
dbresi pdbresn seqresi seqresn A R N D C Q E G H I L K M F P S T W Y V IC
0 M 1 M -2 -2 -3 -4 -2 -1 -3 -4 -3 0 1 -2 9 -1 -4 -2 -2 -2 -2 0 1.05
1 V 2 V 1 -2 -1 -2 -2 -1 -1 5 -2 -3 -3 -2 -2 -3 -2 1 1 -3 -3 -2 0.64
2 L 3 L -3 -3 -5 -5 -2 -3 -4 -5 -4 1 6 -4 1 0 -4 -4 -2 -3 -2 0 0.94
3 S 4 S 0 -2 2 -1 -2 -1 -1 -1 -2 -3 -3 -1 0 -3 -2 5 2 -4 -3 -2 0.69
4 E 5 E -2 -2 0 6 -4 0 4 -1 -2 -4 -5 -1 -4 -5 -2 0 -2 -5 -4 -4 0.98
5 G 6 G 4 -1 -1 -1 -2 1 0 3 -2 -3 -3 0 -2 -4 -2 1 -1 -4 -3 -1 0.41
6 E 7 E -3 -2 1 5 -5 2 5 -3 -2 -5 -5 -1 -4 -5 -3 -2 -2 -5 -4 -4 1.05
7 W 8 W -4 1 -4 -5 -4 -1 -4 -4 1 -3 -2 -2 -3 4 -5 -4 -4 10 3 -2 1.59
8 Q 9 Q -2 0 1 4 -4 5 2 -3 -1 -3 -4 1 -3 -4 -3 -1 -2 -4 -3 -1 0.67
9 L 10 L 0 0 -1 -3 -2 0 -1 -3 1 0 2 1 3 -1 -2 -1 1 -3 -2 0 0.15
It's in json format instead of matrix. We will convert it if necessary. But it is not mandatory right?
I think you need to convert it. Sorry!
I've looked through your generate.py script. You can use it for preprocessing known variant data (1) just as well for preprocessing unknown variant data (3). With only slight modifications for (3):
A) leave out the variant_class
argument, when instantiating a PdbVariantSelection
object. It's optional.
B) set compute_targets=[]
in the DataGenerator
object. This will make sure that the preprocessing won't look for a class value to store.
1) Concerning the pssm matrix, I achieved it using Psi-blast at NCBI. How do you usually compute this matrix? I can write a script for conversion but it will be quite tricky.
2) Before getting to part (3) our problem is to put in the database more than one variation in step (1). I don't get the python syntax that I should use.
We used PSIBLAST too to compute this matrix, but the PSSM data needs to be mapped to the PDB file. For that, we used a tool called PSSMgen: https://github.com/DeepRank/pssmgen
You need to have a CSV table file, containing your variant data: PDBID, residue number, amino acid, Then the script can use the pandas library to extract the data from that CSV file and put it in PdbVariantSelection objects.
Hi, thank you for provicing the 3 scripts. I created the table.csv and the generate.py. Here the error I'm getting now:
python generate.py
Traceback (most recent call last):
File "generate.py", line 5, in <module>
from deeprank.generate import *
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/__init__.py", line 1, in <module>
from .DataGenerator import DataGenerator
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/DataGenerator.py", line 17, in <module>
from deeprank.generate import GridTools as gt
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/GridTools.py", line 13, in <module>
from deeprank.operate import hdf5data
File "/opt/tools/deg/DeepRank-Mut/deeprank/operate/hdf5data.py", line 7, in <module>
from deeprank.domain.amino_acid import amino_acids
File "/opt/tools/deg/DeepRank-Mut/deeprank/domain/amino_acid.py", line 52
amino_acids_by_code {
^
SyntaxError: invalid syntax
Also, two other minor points:
$ git clone --branch development https://github.com/DeepRank/DeepRank-mut.git
Cloning into 'DeepRank-mut'...
fatal: Remote branch development not found in upstream origin
It would be very useful to have a working example of this scripts, if possible...
Sorry about the syntax error. I fixed it. I wouldn't recommend the developmental branch.
I'm trying to fix the pytest script, but it will take some time. Sorry, currently things are quite busy on our side.
Sorry, there are still some errors.
python generate.py
Traceback (most recent call last):
File "generate.py", line 22, in
I don't see how to solve it.
There is also and indentation error here: File "generate.py", line 53 grid_info = { ^ IndentationError: unexpected indent
But that was easy to solve.
Indeed. I pushed the fix. Sorry!
Ok, now it's running. But I got this error.
803, 'O', 'O', 'A', 352, 'SER', '', '', 1.0], [-32.719, 25.157, 60.035, 2804, 'OG', 'O', 'A', 352, 'SER', '', '', 1.0], [-29.576, 22.331, 60.691, 2805, 'OXT', 'O', 'A', 352, 'SER', '', '', 1.0]] for x,y,z,rowID,name,element,chainID,resSeq,resName,iCode,altLoc,occ
Creating database : 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:23<00:00, 1.93s/it, variant=1CR4.pdb]
Molecules with errored features are removed:
['1CR4:A:200:Glutamine->Alanine-m7857361863297430751', '1CR4:A:281:Histidine->Alanine-m3856591489522861677', '1CR4:A:284:Isoleucine->Alanine-m6427764906900908035', '1CR4:A:259:Isoleucine->Alanine-m6594343748205688219', '1CR4:A:113:Histidine->Alanine-m8237672030338117968', '1CR4:A:203:Histidine->Alanine-7177783304471683210', '1CR4:A:171:Aspartate->Asparagine-614671715635415912', '1CR4:A:175:Alanine->Phenylalanine-7912783878010190788', '1CR4:A:187:Aspartate->Alanine-3861484985167401279', '1CR4:A:255:Tyrosine->Alanine-9130804594370861896', '1CR4:A:262:Aspartate->Asparagine-m4545163254089794910', '1CR4:A:288:Glutamate->Alanine-m5086511475306243927']
# Successfully created database: train_data.hdf5
Traceback (most recent call last):
File "generate.py", line 59, in <module>
database.map_features(grid_info,try_sparse=True, time=False, prog_bar=True)
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/DataGenerator.py", line 1015, in map_features
raise ValueError(f'No variants found in {self.hdf5}.')
ValueError: No variants found in train_data.hdf5.
I don't understand with my molecules are removed. The substitutions that I have indicated in the table.csv are valid.
Were any logs created?
If not, then add this to the beginning of your script:
import logging
logging.basicConfig(filename="deeprank-mut.log", level=logging.DEBUG)
This should output the errors, that cause your variants to be skipped.
DEBUG:h5py._conv:Creating converter from 5 to 3
INFO:deeprank:
# Start creating HDF5 database: train_data.hdf5
INFO:deeprank:
Processing variant: 1CR4:A:200:Glutamine->Alanine-7574597180198206144
ERROR:deeprank:Error while computing deeprank.features.atomic_contacts for 1CR4:A:200:Glutamine->Alanine: Traceback (most recent call last):
File "/opt/tools/deg/DeepRank-Mut/deeprank/operate/pdb.py", line 64, in get_atoms
x, y, z, atom_number, atom_name, element, chain_id, residue_number, residue_name, insertion_code, altloc, occ = row
ValueError: too many values to unpack (expected 12)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/DataGenerator.py", line 1457, in _compute_features
feat_module.__compute_feature__(environment, distance_cutoff, featgrp, variant)
File "/opt/tools/deg/DeepRank-Mut/deeprank/features/atomic_contacts.py", line 96, in __compute_feature__
atoms = _get_atoms_around_variant(environment, variant)
File "/opt/tools/deg/DeepRank-Mut/deeprank/features/atomic_contacts.py", line 60, in _get_atoms_around_variant
for atom1, atom2 in get_residue_contact_atom_pairs(pdb,
File "/opt/tools/deg/DeepRank-Mut/deeprank/operate/pdb.py", line 115, in get_residue_contact_atom_pairs
atoms = get_atoms(pdb2sql)
File "/opt/tools/deg/DeepRank-Mut/deeprank/operate/pdb.py", line 66, in get_atoms
raise ValueError("Got unexpected row {} for {}".format(row, request_s))
ValueError: Got unexpected row [[27.976, 1.967, -10.746, 0, 'N', 'N', 'A', 1, 'MET', '', '', 1.0], [28.961, 0.908, -11.035, 1, 'CA', 'C', 'A', 1, 'MET', '', '', 1.0],...
This suggests that something might have changed in the output of pdb2sql. But I don't see this happening at my end. Could you run:
pytest test/operate/test_pdb.py
I'm not able to use pytest. I'm attaching the pdb file, the table file and the generate.py script, maybe you can test them Archive.zip
(deeprank) [imerelli@slurmlogin DeepRank-Mut]$ pytest test/operate/test_pdb.py ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...] pytest: error: unrecognized arguments: --cov --cov-report --cov-report term --cov-report html test/operate/test_pdb.py inifile: /opt/tools/deg/DeepRank-Mut/setup.cfg rootdir: /opt/tools/deg/DeepRank-Mut
(deeprank) [imerelli@slurmlogin DeepRank-Mut]$ pytest ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...] pytest: error: unrecognized arguments: --cov --cov-report --cov-report term --cov-report html inifile: /opt/tools/deg/DeepRank-Mut/setup.cfg rootdir: /opt/tools/deg/DeepRank-Mut
So apparently pdb2sql behaves differently on some PDB files. I made a unit test and a fix for it. I hope this helps you.
Ok, the loading of the pdb file seems fixed. Now I have problems with the pssm matrix. I was able to achieve the matrix in the format required by your software using psi-blast and a python conversion script (I wan't able to use https://github.com/DeepRank/pssmgen), but now I have this problem:
Creating database : 92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 11/12 [00:14<00:00, 1.73it/s, variant=1CR4.pdb]
Processing variant: 1CR4:A:288:Glutamate->Alanine-m8091716489555719240
ERROR: Error while computing deeprank.features.neighbour_profile for 1CR4:A:288:Glutamate->Alanine: Traceback (most recent call last):
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/DataGenerator.py", line 1457, in _compute_features
feat_module.__compute_feature__(environment, distance_cutoff, featgrp, variant)
File "/opt/tools/deg/DeepRank-Mut/deeprank/features/neighbour_profile.py", line 100, in __compute_feature__
pssm = _get_pssm(chain_ids, variant, environment)
File "/opt/tools/deg/DeepRank-Mut/deeprank/features/neighbour_profile.py", line 83, in _get_pssm
raise FileNotFoundError("No PSSM for {} chain {} in {}".format(variant.pdb_ac, chain_id, environment.pssm_root))
FileNotFoundError: No PSSM for 1CR4 chain A in /opt/tools/deg/DeepRank-Mut
Traceback (most recent call last):
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/DataGenerator.py", line 1457, in _compute_features
feat_module.__compute_feature__(environment, distance_cutoff, featgrp, variant)
File "/opt/tools/deg/DeepRank-Mut/deeprank/features/neighbour_profile.py", line 100, in __compute_feature__
pssm = _get_pssm(chain_ids, variant, environment)
File "/opt/tools/deg/DeepRank-Mut/deeprank/features/neighbour_profile.py", line 83, in _get_pssm
raise FileNotFoundError("No PSSM for {} chain {} in {}".format(variant.pdb_ac, chain_id, environment.pssm_root))
FileNotFoundError: No PSSM for 1CR4 chain A in /opt/tools/deg/DeepRank-Mut
In attachment the pdb and pssm files. Archive.zip
It expects the chain id in the pssm filename. Like:
/opt/tools/deg/DeepRank-Mut/1cr4.A.pdb.pssm
Ok, now the pssm file is read, but there are still errors in parsing it
Creating database : 92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 11/12 [00:13<00:00, 1.75it/s, variant=1CR4.pdb]
Processing variant: 1CR4:A:288:Glutamate->Alanine-2980880347696195539
ERROR: Error while computing deeprank.features.neighbour_profile for 1CR4:A:288:Glutamate->Alanine: Traceback (most recent call last):
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/DataGenerator.py", line 1457, in _compute_features
feat_module.__compute_feature__(environment, distance_cutoff, featgrp, variant)
File "/opt/tools/deg/DeepRank-Mut/deeprank/features/neighbour_profile.py", line 100, in __compute_feature__
pssm = _get_pssm(chain_ids, variant, environment)
File "/opt/tools/deg/DeepRank-Mut/deeprank/features/neighbour_profile.py", line 86, in _get_pssm
pssm.merge_with(parse_pssm(f, chain_id))
File "/opt/tools/deg/DeepRank-Mut/deeprank/parse/pssm.py", line 47, in parse_pssm
return parse_old_pssm(file_)
File "/opt/tools/deg/DeepRank-Mut/deeprank/parse/pssm.py", line 14, in parse_old_pssm
pssm.set_amino_acid_value(residue, code, float(value))
ValueError: could not convert string to float: 'seqresn'
Traceback (most recent call last):
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/DataGenerator.py", line 1457, in _compute_features
feat_module.__compute_feature__(environment, distance_cutoff, featgrp, variant)
File "/opt/tools/deg/DeepRank-Mut/deeprank/features/neighbour_profile.py", line 100, in __compute_feature__
pssm = _get_pssm(chain_ids, variant, environment)
File "/opt/tools/deg/DeepRank-Mut/deeprank/features/neighbour_profile.py", line 86, in _get_pssm
pssm.merge_with(parse_pssm(f, chain_id))
File "/opt/tools/deg/DeepRank-Mut/deeprank/parse/pssm.py", line 47, in parse_pssm
return parse_old_pssm(file_)
File "/opt/tools/deg/DeepRank-Mut/deeprank/parse/pssm.py", line 14, in parse_old_pssm
pssm.set_amino_acid_value(residue, code, float(value))
ValueError: could not convert string to float: 'seqresn'
The pssm file is the one attached before, it has the following structure, which seems identical to the one you suggested:
dbresi pdbresn seqresi seqresn A R N D C Q E G H I L K M F P S T W Y V IC
1 D 1 D -3 -2 2 7 -4 -1 1 -2 -2 -4 -5 -2 -4 -5 -2 -1 -2 -5 -4 -4 1.00
2 Y 2 Y -2 -2 0 -3 -3 -2 -2 -1 0 -2 -2 -2 -2 3 -3 0 0 1 7 -2 1.00
3 H 3 H -1 0 -1 1 -3 3 1 -3 4 -2 -2 2 0 -1 -2 0 0 -2 3 -1 1.00
4 E 4 E -2 -2 1 3 -4 0 5 -3 -2 -3 -4 -1 -3 -4 4 -1 -2 -4 -1 -1 1.00
5 D 5 D -2 -2 -1 6 -4 -1 2 -3 -2 -4 -2 -2 -3 -3 1 1 0 -4 0 -3 1.00
6 Y 6 Y -2 -2 1 1 -3 -2 -2 -3 2 -2 -1 -2 0 1 -1 1 0 -1 6 -2 1.00
7 G 7 G 0 -1 2 1 -3 -1 0 4 -2 -2 -3 -1 -3 -3 1 0 -1 -3 0 -1 1.00
8 F 8 F 1 -1 0 -2 0 -1 -1 -1 1 0 0 -1 1 2 -2 0 1 -2 1 -1 1.00
9 S 9 S 1 -2 2 -2 -2 -1 -2 1 -1 -1 -1 -2 1 2 -1 1 1 -2 2 0 1.00
10 S 10 S 1 -2 2 2 -2 -1 0 1 -2 -2 -1 -1 -2 -2 -2 2 1 -3 -1 0 1.00
11 F 11 F 0 -1 0 -1 -2 -2 -2 -1 -1 0 0 -2 1 3 -1 1 1 2 3 0 1.00
12 N 12 N -2 -2 5 2 -1 0 0 1 0 -4 -4 0 -3 -3 0 1 -1 -3 1 -4 1.00
13 D 13 D -1 -2 3 4 -4 -1 2 -1 1 -3 -2 0 -3 -3 -2 0 0 -4 0 -3 1.00
14 S 14 S 2 -2 0 0 2 0 -1 0 -2 -3 -3 -2 -2 -1 0 3 2 -3 1 -2 1.00
15 S 15 S 0 -1 0 0 2 -1 0 0 1 -2 -2 -1 -1 -3 1 3 2 1 0 -1 1.00
Your first line says dbresi
, but it should be pdbresi
.
Ok, thank you. Unfortunately, I have another error:
Creating database : 0%| | 0/12 [00:10<?, ?it/s, variant=1CR4.pdb]
Traceback (most recent call last):
File "generate.py", line 51, in <module>
database.create_database(prog_bar=True)
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/DataGenerator.py", line 390, in create_database
rotation_center = self._add_aug_pdb(variant_group, variant,
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/DataGenerator.py", line 1624, in _add_aug_pdb
pdb2sql.transform.rot_axis(sqldb, axis, angle)
File "/opt/tools/deg/miniforge3/envs/deeprank/lib/python3.8/site-packages/pdb2sql/transform.py", line 43, in rot_axis
xyz = rot_xyz_around_axis(xyz, axis, angle)
File "/opt/tools/deg/miniforge3/envs/deeprank/lib/python3.8/site-packages/pdb2sql/transform.py", line 106, in rot_xyz_around_axis
return rotate(xyz, rot_mat, center)
File "/opt/tools/deg/miniforge3/envs/deeprank/lib/python3.8/site-packages/pdb2sql/transform.py", line 198, in rotate
return np.dot(rot_mat, (xyz - center).T).T + center
File "<__array_function__ internals>", line 180, in dot
ValueError: shapes (3,3) and (3,2806,1) not aligned: 3 (dim 1) != 2806 (dim 1)
OK, pdb2sql seems to have a problem with your PDB file.
I've done some tests.
Removing the ENDMDL
line from the PDB file fixes the problem.
Where did you get this PDB file? The pdb's original 1CR4 file looks different.
Thank you. It works. Now I moved to the second scripts:
$ python test_learn.py
========================================
= DeepRank Data Set
=
= Training data
= -> train_data.hdf5
=
=
=
========================================
Checking dataset Integrity
Processing data set:
Train dataset
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547 from train_data.hdf5
Data Set Info:
Augmentation : 10 rotations
Training set : 132 conformations
Validation set : 0 conformations
Test set : 0 conformations
Number of channels : 31
Grid Size : 30, 30, 30
Traceback (most recent call last):
File "test_learn.py", line 25, in <module>
model = NeuralNet(data_set,cnn_reg,model_type='3d',task='reg',
TypeError: __init__() got an unexpected keyword argument 'plot'
Dear Ivan Merelli,
We highly appreciate your continued interest in getting DeepRank-Mut work for your data. Thank you for using our package. Must we add that this software is better suited for those who have slightly advanced knowledge in python. The tool requires several things to prepare before getting it to run smoothly. This is not to discourage you from raising issues on github, we are happy to address them. Just a gentle nudge that it is encouraged to seek help for (maybe) critical issues.
For instance, the learning task at hand is classification instead of regression. Hence, the correct line of commands would be
neural_net = NeuralNet(dataset, cnn_class, model_type='3d',task='class',cuda=False, metrics_exporters=[OutputExporter(run_directory), TensorboardBinaryClassificationExporter(run_directory)])
The error is thrown as the classification task uses TensorFlow for plots. The argument 'plot' is from the master version DeepRank for protein complexes. To view plots after training, you would need to install tensorboard.
Also, I notice you do not have validation or test datasets; you can divide your training data if you wish by specifying the following in the learn.py script:
divide_trainset= [0.8, 0.2]
You also have the option of feeding your own validation and test datasets.
I would recommend you to go through the codes in DataSet.py, NeuralNet.py and model3d.py to see what options would work best for your learning task.
Thanks again for your interest.
Dear Gayatri Ramakrishnan,
Thank you for your help. Your tool is very interesting, but objectively difficult to use. Essentially, there isn't still a working example in the repository. The problem is also that there is no much documentation and there are some inaccuracies in the explanation, such as that pytest is not usable and that the PSSM matrix is actually mandatory, while it is listed as optional.
I can't go through all your code to understand how it works, it was already very complicated to create the PSSM matrix. I just want to be able to model mutations in my protein, and we are trying to do this, so I thank you. I think that setting up a working example could also be useful for you. Once this working example will be completed, you can certainly use it for documentation, so this work is important for everyone.
That said, I did not understand your suggestion. Perhaps neural_net should be model? Should dataset be data_set? Do I need to import cnn_class? And yet it still tells me
NameError: name 'OutputExporter' is not defined.
Also, I wouldn't know where to insert divide_trainset= [0.8, 0.2], since that variable does not exist. Please, once the database is created with the first script, could you please provide me with a script to perform the learning of the network in the simplest way possible?
Actually, PSSM is optional. But if you omit it, then you must remove 'deeprank.features.neighbour_profile' from the feature list. This feature is computed from PSSM.
The OutputExporter problem can be solved by importing it, as shown in the readme.
Hi, thank ypu for your help. I certainly made some progress. I solved some issues according to your suggestions, but now there is something out of my capacity. I'm pasting here the code (maybe you can paste it in your readme) and the the result of running it:
$ cat learn.py
from deeprank.learn import *
from deeprank.learn.model3d import cnn_reg
from deeprank.models.metrics import OutputExporter, TensorboardBinaryClassificationExporter
import torch.optim as optim
import numpy as np
# preprocessed input data
hdf5_path = 'train_data.hdf5'
# output directory
out = './my_deeplearning_train/'
# declare the dataset instance
data_set = DataSet(
hdf5_path,
grid_info={
'number_of_points': (10, 10, 10),
'resolution': (10, 10, 10)
},
select_feature='all',
select_target='class',
)
# create the network
model = NeuralNet(data_set,cnn_reg,model_type='3d',task='reg',
cuda=False,metrics_exporters=[OutputExporter(out),
TensorboardBinaryClassificationExporter(out)])
# change the optimizer (optional)
model.optimizer = optim.SGD(model.net.parameters(),
lr=0.001,
momentum=0.9,
weight_decay=0.005)
# start the training, this will generate a model file named `best_valid_model.pth.tar`.
model.train(nepoch=50, divide_trainset=0.8, train_batch_size=5, num_workers=1)
$ python learn.py
========================================
= DeepRank Data Set
=
= Training data
= -> train_data.hdf5
=
=
=
========================================
Checking dataset Integrity
Processing data set:
Train dataset
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547 from train_data.hdf5
Data Set Info:
Augmentation : 10 rotations
Training set : 132 conformations
Validation set : 0 conformations
Test set : 0 conformations
Number of channels : 31
Grid Size : 30, 30, 30
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv3d-1 [-1, 4, 29, 29, 29] 996
MaxPool3d-2 [-1, 4, 14, 14, 14] 0
Conv3d-3 [-1, 5, 13, 13, 13] 165
MaxPool3d-4 [-1, 5, 6, 6, 6] 0
Linear-5 [-1, 84] 90,804
Linear-6 [-1, 1] 85
================================================================
Total params: 92,050
Trainable params: 92,050
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 3.19
Forward/backward pass size (MB): 0.92
Params size (MB): 0.35
Estimated Total Size (MB): 4.46
----------------------------------------------------------------
========================================
= Convolution Neural Network
= model : 3d
= CNN : cnn_reg
= features : AtomicDensities_ind
= C
= N
= O
= S
= features : Feature_ind
= accessibility
= charge
= coulomb
= pssm_ALA
= pssm_ARG
= pssm_ASN
= pssm_ASP
= pssm_CYS
= pssm_GLN
= pssm_GLU
= pssm_GLY
= pssm_HIS
= pssm_ILE
= pssm_LEU
= pssm_LYS
= pssm_MET
= pssm_PHE
= pssm_PRO
= pssm_SER
= pssm_THR
= pssm_TRP
= pssm_TYR
= pssm_VAL
= residue_information_content
= variant_probability
= vdwaals
= wild_type_probability
= targets : class
= CUDA : False
========================================
: Batch Size: 5
: 106 confs. for training
: 26 confs. for validation
: 0 confs. for testing
running epoch 0 on 21 batches
loading variant 1CR4:A:200:Glutamine->Alanine-m3000895191136890148_r002 from train_data.hdf5
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547_r007 from train_data.hdf5
loading variant 1CR4:A:171:Aspartate->Asparagine-7915049557026475700_r003 from train_data.hdf5
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162_r005 from train_data.hdf5
loading variant 1CR4:A:284:Isoleucine->Alanine-3112033218337307991_r002 from train_data.hdf5
loading variant 1CR4:A:187:Aspartate->Alanine-m6358192257465413487_r004 from train_data.hdf5
-> mini-batch: 1
loading variant 1CR4:A:171:Aspartate->Asparagine-7915049557026475700_r005 from train_data.hdf5
loading variant 1CR4:A:187:Aspartate->Alanine-m6358192257465413487_r005 from train_data.hdf5
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162_r009 from train_data.hdf5
loading variant 1CR4:A:288:Glutamate->Alanine-m9090782543475449484_r005 from train_data.hdf5
loading variant 1CR4:A:171:Aspartate->Asparagine-7915049557026475700_r006 from train_data.hdf5
-> mini-batch: 2
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162_r001 from train_data.hdf5
loading variant 1CR4:A:171:Aspartate->Asparagine-7915049557026475700 from train_data.hdf5
loading variant 1CR4:A:187:Aspartate->Alanine-m6358192257465413487_r007 from train_data.hdf5
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162_r008 from train_data.hdf5
loading variant 1CR4:A:284:Isoleucine->Alanine-3112033218337307991 from train_data.hdf5
-> mini-batch: 3
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162_r006 from train_data.hdf5
loading variant 1CR4:A:187:Aspartate->Alanine-m6358192257465413487_r006 from train_data.hdf5
loading variant 1CR4:A:262:Aspartate->Asparagine-8570449461295847659_r010 from train_data.hdf5
loading variant 1CR4:A:255:Tyrosine->Alanine-4015625884953984564_r005 from train_data.hdf5
loading variant 1CR4:A:187:Aspartate->Alanine-m6358192257465413487_r001 from train_data.hdf5
-> mini-batch: 4
loading variant 1CR4:A:288:Glutamate->Alanine-m9090782543475449484_r006 from train_data.hdf5
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162_r004 from train_data.hdf5
loading variant 1CR4:A:187:Aspartate->Alanine-m6358192257465413487_r010 from train_data.hdf5
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162_r007 from train_data.hdf5
loading variant 1CR4:A:262:Aspartate->Asparagine-8570449461295847659_r007 from train_data.hdf5
-> mini-batch: 5
loading variant 1CR4:A:255:Tyrosine->Alanine-4015625884953984564_r010 from train_data.hdf5
loading variant 1CR4:A:171:Aspartate->Asparagine-7915049557026475700_r010 from train_data.hdf5
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547_r010 from train_data.hdf5
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547_r002 from train_data.hdf5
loading variant 1CR4:A:255:Tyrosine->Alanine-4015625884953984564_r001 from train_data.hdf5
-> mini-batch: 6
loading variant 1CR4:A:288:Glutamate->Alanine-m9090782543475449484_r004 from train_data.hdf5
loading variant 1CR4:A:175:Alanine->Phenylalanine-3840484606403438268_r010 from train_data.hdf5
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162_r002 from train_data.hdf5
loading variant 1CR4:A:171:Aspartate->Asparagine-7915049557026475700_r008 from train_data.hdf5
loading variant 1CR4:A:262:Aspartate->Asparagine-8570449461295847659 from train_data.hdf5
-> mini-batch: 7
loading variant 1CR4:A:284:Isoleucine->Alanine-3112033218337307991_r004 from train_data.hdf5
loading variant 1CR4:A:187:Aspartate->Alanine-m6358192257465413487_r008 from train_data.hdf5
loading variant 1CR4:A:284:Isoleucine->Alanine-3112033218337307991_r008 from train_data.hdf5
loading variant 1CR4:A:171:Aspartate->Asparagine-7915049557026475700_r009 from train_data.hdf5
loading variant 1CR4:A:200:Glutamine->Alanine-m3000895191136890148_r005 from train_data.hdf5
-> mini-batch: 8
loading variant 1CR4:A:259:Isoleucine->Alanine-m5958799002590729623_r009 from train_data.hdf5
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547_r009 from train_data.hdf5
loading variant 1CR4:A:288:Glutamate->Alanine-m9090782543475449484 from train_data.hdf5
loading variant 1CR4:A:175:Alanine->Phenylalanine-3840484606403438268_r004 from train_data.hdf5
loading variant 1CR4:A:288:Glutamate->Alanine-m9090782543475449484_r002 from train_data.hdf5
-> mini-batch: 9
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547_r004 from train_data.hdf5
loading variant 1CR4:A:175:Alanine->Phenylalanine-3840484606403438268_r002 from train_data.hdf5
loading variant 1CR4:A:259:Isoleucine->Alanine-m5958799002590729623_r004 from train_data.hdf5
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547_r001 from train_data.hdf5
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162 from train_data.hdf5
-> mini-batch: 10
loading variant 1CR4:A:288:Glutamate->Alanine-m9090782543475449484_r001 from train_data.hdf5
loading variant 1CR4:A:175:Alanine->Phenylalanine-3840484606403438268_r003 from train_data.hdf5
loading variant 1CR4:A:288:Glutamate->Alanine-m9090782543475449484_r007 from train_data.hdf5
loading variant 1CR4:A:255:Tyrosine->Alanine-4015625884953984564_r007 from train_data.hdf5
loading variant 1CR4:A:259:Isoleucine->Alanine-m5958799002590729623_r001 from train_data.hdf5
-> mini-batch: 11
loading variant 1CR4:A:200:Glutamine->Alanine-m3000895191136890148_r004 from train_data.hdf5
loading variant 1CR4:A:255:Tyrosine->Alanine-4015625884953984564_r008 from train_data.hdf5
loading variant 1CR4:A:284:Isoleucine->Alanine-3112033218337307991_r009 from train_data.hdf5
loading variant 1CR4:A:259:Isoleucine->Alanine-m5958799002590729623_r006 from train_data.hdf5
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547_r005 from train_data.hdf5
-> mini-batch: 12
loading variant 1CR4:A:200:Glutamine->Alanine-m3000895191136890148_r008 from train_data.hdf5
loading variant 1CR4:A:284:Isoleucine->Alanine-3112033218337307991_r003 from train_data.hdf5
loading variant 1CR4:A:175:Alanine->Phenylalanine-3840484606403438268_r008 from train_data.hdf5
loading variant 1CR4:A:200:Glutamine->Alanine-m3000895191136890148_r001 from train_data.hdf5
loading variant 1CR4:A:175:Alanine->Phenylalanine-3840484606403438268_r006 from train_data.hdf5
-> mini-batch: 13
loading variant 1CR4:A:262:Aspartate->Asparagine-8570449461295847659_r008 from train_data.hdf5
loading variant 1CR4:A:175:Alanine->Phenylalanine-3840484606403438268 from train_data.hdf5
loading variant 1CR4:A:200:Glutamine->Alanine-m3000895191136890148_r006 from train_data.hdf5
loading variant 1CR4:A:284:Isoleucine->Alanine-3112033218337307991_r005 from train_data.hdf5
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547_r006 from train_data.hdf5
-> mini-batch: 14
loading variant 1CR4:A:200:Glutamine->Alanine-m3000895191136890148_r009 from train_data.hdf5
loading variant 1CR4:A:281:Histidine->Alanine-4426686988381237153_r001 from train_data.hdf5
loading variant 1CR4:A:113:Histidine->Alanine-7893153230422533547_r008 from train_data.hdf5
loading variant 1CR4:A:288:Glutamate->Alanine-m9090782543475449484_r003 from train_data.hdf5
loading variant 1CR4:A:281:Histidine->Alanine-4426686988381237153_r008 from train_data.hdf5
-> mini-batch: 15
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162_r003 from train_data.hdf5
loading variant 1CR4:A:288:Glutamate->Alanine-m9090782543475449484_r009 from train_data.hdf5
loading variant 1CR4:A:262:Aspartate->Asparagine-8570449461295847659_r009 from train_data.hdf5
loading variant 1CR4:A:200:Glutamine->Alanine-m3000895191136890148_r003 from train_data.hdf5
loading variant 1CR4:A:262:Aspartate->Asparagine-8570449461295847659_r001 from train_data.hdf5
-> mini-batch: 16
loading variant 1CR4:A:255:Tyrosine->Alanine-4015625884953984564_r003 from train_data.hdf5
loading variant 1CR4:A:255:Tyrosine->Alanine-4015625884953984564_r006 from train_data.hdf5
loading variant 1CR4:A:284:Isoleucine->Alanine-3112033218337307991_r007 from train_data.hdf5
loading variant 1CR4:A:203:Histidine->Alanine-m5294706904669780162_r010 from train_data.hdf5
loading variant 1CR4:A:255:Tyrosine->Alanine-4015625884953984564_r002 from train_data.hdf5
-> mini-batch: 17
loading variant 1CR4:A:281:Histidine->Alanine-4426686988381237153_r005 from train_data.hdf5
loading variant 1CR4:A:259:Isoleucine->Alanine-m5958799002590729623 from train_data.hdf5
loading variant 1CR4:A:175:Alanine->Phenylalanine-3840484606403438268_r009 from train_data.hdf5
loading variant 1CR4:A:262:Aspartate->Asparagine-8570449461295847659_r006 from train_data.hdf5
loading variant 1CR4:A:255:Tyrosine->Alanine-4015625884953984564_r009 from train_data.hdf5
-> mini-batch: 18
loading variant 1CR4:A:255:Tyrosine->Alanine-4015625884953984564 from train_data.hdf5
loading variant 1CR4:A:262:Aspartate->Asparagine-8570449461295847659_r005 from train_data.hdf5
loading variant 1CR4:A:171:Aspartate->Asparagine-7915049557026475700_r007 from train_data.hdf5
loading variant 1CR4:A:262:Aspartate->Asparagine-8570449461295847659_r003 from train_data.hdf5
loading variant 1CR4:A:281:Histidine->Alanine-4426686988381237153_r006 from train_data.hdf5
-> mini-batch: 19
loading variant 1CR4:A:259:Isoleucine->Alanine-m5958799002590729623_r008 from train_data.hdf5
loading variant 1CR4:A:171:Aspartate->Asparagine-7915049557026475700_r002 from train_data.hdf5
loading variant 1CR4:A:281:Histidine->Alanine-4426686988381237153 from train_data.hdf5
loading variant 1CR4:A:281:Histidine->Alanine-4426686988381237153_r003 from train_data.hdf5
loading variant 1CR4:A:187:Aspartate->Alanine-m6358192257465413487_r003 from train_data.hdf5
-> mini-batch: 20
loading variant 1CR4:A:175:Alanine->Phenylalanine-3840484606403438268_r007 from train_data.hdf5
loading variant 1CR4:A:281:Histidine->Alanine-4426686988381237153_r007 from train_data.hdf5
loading variant 1CR4:A:259:Isoleucine->Alanine-m5958799002590729623_r002 from train_data.hdf5
loading variant 1CR4:A:187:Aspartate->Alanine-m6358192257465413487 from train_data.hdf5
-> mini-batch: 21
Traceback (most recent call last):
File "learn.py", line 36, in <module>
model.train(nepoch=50, divide_trainset=0.8, train_batch_size=5, num_workers=1)
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/NeuralNet.py", line 321, in train
self._train(index_train, index_valid, index_test,
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/NeuralNet.py", line 575, in _train
self._epoch(0, "training", train_loader, False)
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/NeuralNet.py", line 722, in _epoch
self._metrics_output.process(pass_name, epoch_number, entry_names, output_values, target_values)
File "/opt/tools/deg/DeepRank-Mut/deeprank/models/metrics.py", line 49, in process
metrics_exporter.process(pass_name, epoch_number, entry_names, output_values, target_values)
File "/opt/tools/deg/DeepRank-Mut/deeprank/models/metrics.py", line 76, in process
probability = output_values[entry_index][1]
IndexError: list index out of range
It seems like you are combining regression (task='reg'
) with TensorboardBinaryClassificationExporter
. This is not compatible.
Either make it task='class'
, or remove that exporter from the list.
Like I mentioned earlier, please do not use regression for classification tasks
from deeprank.learn.model3d import cnn_class
Unfortunately, this package isn't hardcoded for plug-n-play scenarios. We do not intend to make it that way, instead we have made it modular. The current repo would soon be archived as DeepRank2 gets finalized and released.
I do agree having an example workflow would be better to add. Thanks for the input.
Thank you. It worked. Now I will create a new database with the unseen variants to make predictions using the last script. Then I will upload everything here in case you need it. A very naive question meanwhile: is it possibile to have (maybe download from the database?) also the pdb structures of the mutants to perform further analysis?
PDB structures are downloadable from the wwpdb. Instructions are here: http://www.wwpdb.org/ftp/pdb-ftp-sites
Not sure if that's what you mean.
Hi, I was able to create the database unseen_data.hdf5 with the variants to model. Now I running the prediction step, but I have this problem. Can you help me?
$python learn2.py
========================================
= DeepRank Data Set
=
= Training data
= -> unseen_data.hdf5
=
=
=
========================================
Checking dataset Integrity
Processing data set:
Train dataset
loading variant 1CR4:A:171:Phenylalanine->Alanine-5230317638147088145 from unseen_data.hdf5
Traceback (most recent call last):
File "learn2.py", line 14, in <module>
data_set = DataSet(
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 178, in __init__
self.process_dataset()
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 252, in process_dataset
self.get_input_shape()
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 660, in get_input_shape
feature, _ = self.load_one_variant(fname)
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 931, in load_one_variant
target = variant_data.get('targets/' + self.select_target)[()]
TypeError: 'NoneType' object is not subscriptable
(deeprank) [imerelli@slurmlogin DeepRank-Mut]$ cat learn2.py
from deeprank.learn import *
from deeprank.learn.model3d import cnn_class
from deeprank.models.metrics import OutputExporter, TensorboardBinaryClassificationExporter
import torch.optim as optim
import numpy as np
# preprocessed input data
hdf5_path = 'unseen_data.hdf5'
# output directory
out = './my_deeplearning_train/'
# declare the dataset instance
data_set = DataSet(
hdf5_path,
grid_info={
'number_of_points': (10, 10, 10),
'resolution': (10, 10, 10)
},
select_feature='all',
select_target='class',
)
# create the network
model = NeuralNet(data_set,cnn_class,model_type='3d',task='class',pretrained_model="best_valid_model.pth.tar",
cuda=False,metrics_exporters=[OutputExporter(out), TensorboardBinaryClassificationExporter(out)])
#model = NeuralNet(data_set,cnn_reg,model_type='3d',task='reg',
# cuda=False,metrics_exporters=[OutputExporter(out)])
# change the optimizer (optional)
model.optimizer = optim.SGD(model.net.parameters(),
lr=0.001,
momentum=0.9,
weight_decay=0.005)
# start the training, this will generate a model file named `best_valid_model.pth.tar`.
#model.train(nepoch=50, divide_trainset=0.8, train_batch_size=5, num_workers=1)
model.test()
Concerning the PDB structures, I was wondering if I can download from DeepRank the pdb coordinates of the modelled proteins with the variants, for example to perform docking experiments after the prediction of their CLASS (BENIGN or PATHOGENIC).
I made a recent push to allow the model to run on unlabeled data. Unfortunately, deeprank does not generate structures. But the pymol mutagenesis feature might help you generate a structure of a variant.
Ok, thank you. I downloaded the last version of the github, but I still get errors. If useful I can send you the files I'm using (generate2.py, learn2.py, table2.py).
$ git pull
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 28 (delta 16), reused 24 (delta 14), pack-reused 0
Unpacking objects: 100% (28/28), done.
From https://github.com/DeepRank/DeepRank-Mut
6c6b072..edfed88 main -> origin/main
Updating 6c6b072..edfed88
Fast-forward
README.md | 4 ++--
deeprank/learn/DataSet.py | 42 ++++++++++++++++++++++++++++--------------
deeprank/learn/NeuralNet.py | 28 +++++++++++++++++++---------
test/data/pdb/1CR4/1CR4.pdb | 1 -
test/generate/test_datagenerator.py | 23 +++++++++++++++++++++++
test/operate/test_pdb.py | 3 +++
test/test_learn.py | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 144 insertions(+), 26 deletions(-)
$ python learn2.py
========================================
= DeepRank Data Set
=
= Training data
= -> unseen_data.hdf5
=
=
=
========================================
Checking dataset Integrity
Processing data set:
Train dataset
loading variant 1CR4:A:171:Phenylalanine->Alanine-5230317638147088145 from unseen_data.hdf5
Traceback (most recent call last):
File "learn2.py", line 14, in <module>
data_set = DataSet(
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 178, in __init__
self.process_dataset()
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 252, in process_dataset
self.get_input_shape()
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 662, in get_input_shape
feature, _ = self.load_one_variant(fname)
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 939, in load_one_variant
target_group = variant_data['targets']
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/opt/tools/deg/miniforge3/envs/deeprank/lib/python3.8/site-packages/h5py/_hl/group.py", line 357, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 189, in h5py.h5o.open
KeyError: "Unable to synchronously open object (object 'targets' doesn't exist)"
Whoops! Looks like your output is slightly different from mine. Doesn't matter! I've pushed another patch that allows deeprank to handle even datasets without object 'targets'.
Sorry, still not working. I'm attaching the files to reproduce the analysis. Files with the 2 suffis are related to the inference part, while the others to the learning part. archivio.tar.gz
$ python learn2.py
========================================
= DeepRank Data Set
=
= Training data
= -> unseen_data.hdf5
=
=
=
========================================
Checking dataset Integrity
Processing data set:
Train dataset
loading variant 1CR4:A:171:Phenylalanine->Alanine-5230317638147088145 from unseen_data.hdf5
Computing norm for unseen_data.hdf5
Traceback (most recent call last):
File "learn2.py", line 14, in <module>
data_set = DataSet(
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 178, in __init__
self.process_dataset()
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 256, in process_dataset
self.get_norm()
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 720, in get_norm
self._read_norm()
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/DataSet.py", line 747, in _read_norm
norm.get()
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/NormalizeData.py", line 43, in get
self._extract_data()
File "/opt/tools/deg/DeepRank-Mut/deeprank/generate/NormalizeData.py", line 134, in _extract_data
for tname, tval in target_group.items():
AttributeError: 'NoneType' object has no attribute 'items'
$ cat learn2.py
from deeprank.learn import *
from deeprank.learn.model3d import cnn_class
from deeprank.models.metrics import OutputExporter, TensorboardBinaryClassificationExporter
import torch.optim as optim
import numpy as np
# preprocessed input data
hdf5_path = 'unseen_data.hdf5'
# output directory
out = './my_deeplearning_train/'
# declare the dataset instance
data_set = DataSet(
hdf5_path,
grid_info={
'number_of_points': (10, 10, 10),
'resolution': (10, 10, 10)
},
select_feature='all',
#select_target='class',
)
# create the network
model = NeuralNet(data_set,cnn_class,model_type='3d',task='class',pretrained_model="best_valid_model.pth.tar",
cuda=False,metrics_exporters=[OutputExporter(out), TensorboardBinaryClassificationExporter(out)])
#model = NeuralNet(data_set,cnn_reg,model_type='3d',task='reg',
# cuda=False,metrics_exporters=[OutputExporter(out)])
# change the optimizer (optional)
model.optimizer = optim.SGD(model.net.parameters(),
lr=0.001,
momentum=0.9,
weight_decay=0.005)
# start the training, this will generate a model file named `best_valid_model.pth.tar`.
#model.train(nepoch=50, divide_trainset=0.8, train_batch_size=5, num_workers=1)
model.test()
So this error happened during normalization. Sorry for not testing this. It should be fixed now.
I see the computation go ahead a little, but now I have this errror:
$ python learn2.py
========================================
= DeepRank Data Set
=
= Training data
= -> unseen_data.hdf5
=
=
=
========================================
Checking dataset Integrity
Processing data set:
Train dataset
loading variant 1CR4:A:171:Phenylalanine->Alanine-5230317638147088145 from unseen_data.hdf5
Computing norm for unseen_data.hdf5
Data Set Info:
Augmentation : 10 rotations
Training set : 418 conformations
Validation set : 0 conformations
Test set : 0 conformations
Number of channels : 31
Grid Size : 30, 30, 30
========================================
= DeepRank Data Set
=
= Training data
= -> unseen_data.hdf5
=
=
=
========================================
Checking dataset Integrity
Processing data set:
Train dataset
loading variant 1CR4:A:171:Phenylalanine->Alanine-5230317638147088145 from unseen_data.hdf5
Data Set Info:
Augmentation : 10 rotations
Training set : 418 conformations
Validation set : 0 conformations
Test set : 0 conformations
Number of channels : 31
Grid Size : 30, 30, 30
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
BatchNorm3d-1 [-1, 31, 30, 30, 30] 62
Conv3d-2 [-1, 31, 29, 29, 29] 7,719
BatchNorm3d-3 [-1, 31, 29, 29, 29] 62
ReLU-4 [-1, 31, 29, 29, 29] 0
Conv3d-5 [-1, 64, 28, 28, 28] 15,936
BatchNorm3d-6 [-1, 64, 28, 28, 28] 128
MaxPool3d-7 [-1, 64, 14, 14, 14] 0
ReLU-8 [-1, 64, 14, 14, 14] 0
Conv3d-9 [-1, 64, 12, 12, 12] 110,656
BatchNorm3d-10 [-1, 64, 12, 12, 12] 128
ReLU-11 [-1, 64, 12, 12, 12] 0
Flatten-12 [-1, 110592] 0
BatchNorm1d-13 [-1, 110592] 221,184
Linear-14 [-1, 100] 11,059,300
ReLU-15 [-1, 100] 0
Dropout-16 [-1, 100] 0
Linear-17 [-1, 100] 10,100
ReLU-18 [-1, 100] 0
Dropout-19 [-1, 100] 0
Linear-20 [-1, 2] 202
================================================================
Total params: 11,425,477
Trainable params: 11,425,477
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 3.19
Forward/backward pass size (MB): 52.03
Params size (MB): 43.58
Estimated Total Size (MB): 98.81
----------------------------------------------------------------
Traceback (most recent call last):
File "learn2.py", line 26, in <module>
model = NeuralNet(data_set,cnn_class,model_type='3d',task='class',pretrained_model="best_valid_model.pth.tar",
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/NeuralNet.py", line 235, in __init__
self.load_optimizer_params()
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/NeuralNet.py", line 410, in load_optimizer_params
self.optimizer.load_state_dict(self.state['optimizer'])
File "/opt/tools/deg/miniforge3/envs/deeprank/lib/python3.8/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/opt/tools/deg/miniforge3/envs/deeprank/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
return fn(*args, **kwargs)
File "/opt/tools/deg/miniforge3/envs/deeprank/lib/python3.8/site-packages/torch/optim/optimizer.py", line 770, in load_state_dict
self.__setstate__({'state': state, 'param_groups': param_groups})
File "/opt/tools/deg/miniforge3/envs/deeprank/lib/python3.8/site-packages/torch/optim/adamw.py", line 84, in __setstate__
state_values[0]["step"]
KeyError: 'step'
Looks like torch has trouble interpreting your pretrained model. Maybe it works to retrain it. Otherwise, could you upload it here?
I retrained the model, but I have the same error. Here a link to the pretrained model, it's too big for github https://www.dropbox.com/scl/fi/9t7soyhercnd565tcxa02/best_valid_model.pth.tar?rlkey=6grljyls1b7lv60xeq9kilg1q&dl=0
So it appeared that there was a bug in loading the optimizer settings from the preloaded model. But you don't even need an optimizer in step 3. So I made it optional in my last push.
Okay, the computation of learn2.py ran smoothly. But now, where can I find the results? I mean, where are the predictions about whether my variants are benign or pathogenic?
Sorry, I forgot about that.
I pushed a fix. You'll need to pull, use the OutputExporter
in your learn2.py
script and run it again.
Output will go to a file in the output directory you set for it.
Sorry, but after the last pull I have this error:
loading variant 1CR4:A:173:Phenylalanine->Isoleucine-m5953980920123066196_r001 from unseen_data.hdf5
-> mini-batch: 418
Traceback (most recent call last):
File "learn2.py", line 40, in <module>
model.test()
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/NeuralNet.py", line 373, in test
self._epoch(0, "test", loader, False)
File "/opt/tools/deg/DeepRank-Mut/deeprank/learn/NeuralNet.py", line 742, in _epoch
self._metrics_output.process(pass_name, epoch_number, entry_names, output_values, target_values)
File "/opt/tools/deg/DeepRank-Mut/deeprank/models/metrics.py", line 49, in process
metrics_exporter.process(pass_name, epoch_number, entry_names, output_values, target_values)
File "/opt/tools/deg/DeepRank-Mut/deeprank/models/metrics.py", line 70, in process
loss = cross_entropy(tensor(output_values), tensor(target_values))
File "/opt/tools/deg/miniforge3/envs/deeprank/lib/python3.8/site-packages/torch/nn/functional.py", line 3053, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
IndexError: Target -1 is out of bounds.
Here my script:
$ cat learn2.py
from deeprank.learn import *
from deeprank.learn.model3d import cnn_class
from deeprank.models.metrics import OutputExporter, TensorboardBinaryClassificationExporter
import torch.optim as optim
import numpy as np
# preprocessed input data
hdf5_path = 'unseen_data.hdf5'
# output directory
out = './my_deeplearning_train/'
# declare the dataset instance
data_set = DataSet(
hdf5_path,
grid_info={
'number_of_points': (10, 10, 10),
'resolution': (10, 10, 10)
},
select_feature='all',
#select_target='class',
)
# create the network
model = NeuralNet(data_set,cnn_class,model_type='3d',task='class',pretrained_model="best_valid_model.pth.tar",
cuda=False,metrics_exporters=[OutputExporter(out), TensorboardBinaryClassificationExporter(out)])
#model = NeuralNet(data_set,cnn_reg,model_type='3d',task='reg',
# cuda=False,metrics_exporters=[OutputExporter(out)])
# change the optimizer (optional)
#model.optimizer = optim.SGD(model.net.parameters(),
# lr=0.001,
# momentum=0.9,
# weight_decay=0.005)
# start the training, this will generate a model file named `best_valid_model.pth.tar`.
#model.train(nepoch=50, divide_trainset=0.8, train_batch_size=5, num_workers=1)
model.test()
Remove the TensorboardBinaryClassificationExporter
from your script. It only works if you have a binary target value.
Hi, I'm trying to use DeepRank-Mut. The first problem is that I don't get how to run the tests, because in the documentation it is stated to enter in the test directory and run pytest, but this command is not valid.
However, from the root directory I can tun the test scripts that are in the test directory. Here is the output. While test/test_tools.py and test/test_atomic_features.py provide an output, test/test_generate.py and test/test_learn.py do not provide any output. Is this excepted? There is something that I can do differently?