Closed kad-ecoli closed 5 years ago
Yes the variable names changed between TF 1.1 and 1.4, and so you will need a more recent TF to get it to work. Also note that the checkpointed models use cuDNN kernels.
There is some misunderstanding. I used TensorFlow 1.11 (>=1.4), not TensorFlow 1.1. TensorFlow 1.11 is one of the version with whom this repository is supposed to work, as stated in readme.
TensorFlow 1.11 is already the most recent TensorFlow I can get on anaconda as of this post.
My apologies--I misread your initial post as referring to TF 1.1. It's not a version compatibility issue then.
Judging by the error message, I'm guessing it's not finding the cuDNN kernels. Are you using an Nvidia GPU? The pre-trained models must be run on one, because training was done with the cuDNN LSTM kernels. TF does now support conversion between the cuDNN LSTMs and the vanilla TF ones, but I haven't implemented the functionality yet.
So the lack of CUDA is the main reason. I guess I need to try to covert the model to make it work with CPU.
Yes the cuDNN LSTM units are not currently being constructed in a way that makes them convertible between the CPU and GPU versions, but I know the latest TF supports conversion between the two. I will leave this issue open and see if I can get around to it myself, but if you make progress let me know as well!
I had the same issue and solved it by explicitly specifying the -g argument as 0. However, after the code runs to completion, where are the output files generated about the prediction?
That's unlikely to have worked. What do the logs say? Output should be in base/runs/runName/datasetName/...
Hi @alquraishi I had the same error as @amanchandra333 using tensorflow 1.10 and was able to resolve it by setting the graphics card to zero and the gpu fraction to 0.8. The code ran to completion andwhen used on the CASP12 data set with the CASP12 configuration file. My output of nvidia-smi
was
Fri Nov 16 18:02:19 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:04.0 Off | 0 |
| N/A 31C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I ran protling as follows:
python rgn/model/protling.py rgn/configurations/CASP12.config -d CASP12/ -g 0 -f 0.9
The messages that were given at the end of the run from RGN12/log/CASP12.log
after all of the model configuration data were
2018-11-15 21:10:02.399224: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-15 21:10:02.499942: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so r
eturning NUMA node zero
2018-11-15 21:10:02.500352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-11-15 21:10:02.500377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-15 21:10:02.811868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-15 21:10:02.811933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-11-15 21:10:02.811943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-11-15 21:10:02.812226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9152 MB memory) -> physical GPU (d
evice: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
WARNING:tensorflow:From /home/rlc343/rgn/model/model.py:452: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the tf.data module.
Unfortunately when I look in the folder RGN12/runs/CASP12/ProteinNet12Thinning90/1
there is only an error.log file which contains error and loss results, but no .tertiary files.
Any help would be appreciated! Thanks so much.
Hi @rowancallahan, are you trying to train a new model or just make predictions? If the latter then you need the -p
option, and possibly -e
depending on what set you want to make predictions for, e.g. -e weighted_testing
if you want predictions for the test set.
My apologies! I have been running with the -p parameter for predictions my query should have read
python rgn/model/protling.py rgn/configurations/CASP12.config -d CASP12/ -g 0 -f 0.9 -p
I am trying to run for predictions.
Hmmm. Would you mind summarizing your directory structure? Is the data
directory inside of RGN12
? If so then RGN12
is your base directory and not CASP12
. I.e. if this is what you have:
RGN12/runs/CASP12/ProteinNet12Thinning90/...
RGN12/data/ProteinNet12Thinning90/...
then I would pass RGN12
to -d
and not CASP12
. Also, try using the configuration that's in RGN12/runs/CASP12/ProteinNet12Thinning90/
just in case, although that shouldn't really matter.
Are you able to train new models from scratch and only prediction is not working? Or are you unsure? Also can you include the output from error.log
?
Thanks!
Hi @alquraishi I renamed RGN12 to CASP12 so my directory structure is
CASP12/runs/CASP12/ProteinNet12Thinning90/...
CASP12/data/ProteinNet12Thinning90/...
I tried renaming my folders back to RGN12 and rerunning with my directory structure as
RGN12/runs/CASP12/ProteinNet12Thinning90/...
RGN12/data/ProteinNet12Thinning90/...
I also tried using the configuration file that was listed in RGN12/runs/CASP12ProteinNet12Thinning90/configuration
Here are the sanity checks that I have performed so far.
After redownloading the RGN12 data and looking through the downloaded dataset it seems that some predictions are already made in the RGN12/runs/CASP12/ProteinNet12Thinning90/...
folder. I checked which files were already created and which folders were being predicted. Before running any predictions, all folders except the folder named "1" contain an error.log
file and a OutputsValidation
subfolder which contains a list of .tertiary
files.
However, after changing the SampleValidationGlob
and trying to run prediction for a different protein the .tertiary
files are not updated or changed. It seems like the model runs fine, and it appears to train. However, my current end goal is to take a large batch of PSSMs and MSAs and predict protein structures for visualization in PyMol.
Finally I tried deleting all of the numbered folders in RGN12/runs/CASP12/ProteinNet12Thinning90/...
After doing this and rerunning I find that only folder 11 is recreated and that folder 11 now contains .tertiary
and .recurrent_states
files with no error.log
file. Are these the novel predictions? Is it possible to construct a 3d structure for visualization using these files?
Yes you will generally see new predictions be saved in the highest number folder, because that's where the checkpoint is at (i.e. the training iteration of the model that is loaded when you try to make predictions.) And yes the .tertiary
files contain the backbone coordinates of the newly predicted proteins. The triplets are x,y,z coordinates, and they alternate between the three backbone atoms (C_alpha, N, and C').
Hi~ I have a few questions
1) the Usage says "This predicts the structures of the dataset specified in the configuration file. By default only the validation set is predicted, but this can be changed using the -e option." , so if I want to predict the test set, is it OK to write "python protling.py [configurationFilePath] -d [baseDirectory] -e TESTING_MODEL" ?
2) If I want to count the standard dRMSD you use for reporting accuracies in the
THANKS!
Hi @FACEkimi, and sorry for the delay.
-e weighted_testing
and not -e TESTING_MODEL
.Thanks for your reply~ But I have an another questions, I try to use the outputs prediction and the PDB files to count dRMSD, however I found that for almost all proteins, the amount of the output numbers is not the same as the amount in PDB files. For example, for 1AEP, the blackbone(CA+C'+N) in PDB has 4593 (3 is because x,y,z), but the tertiary has 4833, and for 1DZL, in PDB has 13653 and the tertiary has 15153, and for 1HI9, in PDB has 41103 but the tertiary has 822*3. I don't know why?
Hi @FACEkimi, the PDB files may contain multiple domains, and in some instances may having missing residues that are predicted by the model, which would result in a lack of a one-to-one correspondence. My suggestion would be to use the structures in the ProteinNet data set, as they are already formatted to be matched to the predicted ones.
Helo I am Rashid and doing master thesis on protein sequence to structure prediction. I tried according to the github instruction @alquraishi and also read the previous problem here.
I am also trying to make prediction of Predict sequences in ProteinNet TFRecords format using a trained model; I used the script as: python Machine_Learning/rgn-master/model/protling.py Machine_Learning/rgn-master/configurations/CASP7.config -d Machine_Learning/rgn-master/RGN7 -p -e weighted_testing script ran well and i did not get any error but i did not understand where is my output 3d structure and how i can visualize through chimera. Would you please help me to continue this work properly. Thanks
I run tensorflow 1.11 on 64bit CentOS Linux 6.10. I downloaded pre-trained model RGN7.tar.gz, untar it to RGN7/, and run protling.py as
python2.7 ../rgn/model/protling.py ../rgn/configurations/CASP7.config -d RGN7 -p
The prediction apparently failed with the following complaint. Is this caused by mismatching tensorflow version?