Closed ghost closed 4 years ago
I investigated a ittle bit and found the problem may probably arise from dataProcessing.py
. The function drawMolFromSmiles
does not work properly. It generates .svg files with size 250X250, and when the .svg files are converted to .png, the size becomes 266X266 even if the IMG_SIZE
is strictly set to 200. More serious problems appear later: the command
img_arr = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
gives an array with 266X266 for each img, and the elements in the arrays are identically 255, obviously, the imgs are not properly processed. Attached are two examples. Any idea??
Thank you for raising the issue and for the explanation. We are investigating the issue and we will reply again, as soon as we addressed it.
I have run the script without getting any error and unfortunately I could not reproduce the error that you got. I checked the image sizes and they are all 200x200 images. I also checked the shape of img_arr and it is (200, 200).
From the error it seems that cairosvg or Draw.MolToFile creates a different-sized images although we specify the image size as 200x200. The only thing that came to my mind is that we are using the different versions of the libraries and the new functions somehow put outer frames or something similar to the images. The exact versions that we are using are written in the readme page. We will check this issue further and let you know if we can reproduce the error and find a solution.
My tools include (new anaconda env, the packages are install from conda install):
python 3.5.6
tensorflow 1.10.0 gpu_py35hd9c640d_0
tensorflow-base 1.10.0 gpu_py35had579c0_0
tensorflow-gpu 1.10.0 hf154084_0
tflearn 0.3.2 py35h05ed11d_0 contango
scikit-learn 0.19.2 py35h4989274_0
numpy 1.14.5 py35h1b885b7_4
numpy-base 1.14.5 py35hdbf6ddf_4
cairosvg 2.4.2 py_0 conda-forge
rdkit 2018.03.4 py35ha4bbe77_0 conda-forge
opencv3 3.1.0 py35_0 menpo
Unfortunately, we could not replicate the error no matter what we tried. Could you please try with the given tool/library versions in our repository:
Python 3.5.2 Tensorflow 1.12.0 Tflearn 0.3.2 Sklearn 0.19.2 Numpy 1.14.5 CairoSVG 2.1.2 RDkit 2016.09.4 OpenCV 3.3.0
We hope that we can have a better idea about the issue if you could try this. Thank you.
Sure. I'll try this to see what is happening. Thank you very much.
I'm sorry, I deployed these packages, but still got error messages, the shape of the graph remains 266X266, so it breaks. Is there something related to my OS, CentOS7 GPU platform.
By the way, I got some warnings:
(deepscreen) [ai_robot@gpu bin]$ python trainDEEPScreenDUDE.py ImageNetInceptionV2 hdac8 adam 0.0001 5 0 0 1 1 1
WARNING:tensorflow:From /data/ai_robot/Anaconda3/envs/deepscreen/lib/python3.5/site-packages/tflearn/initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
WARNING:tensorflow:From /data/ai_robot/Anaconda3/envs/deepscreen/lib/python3.5/site-packages/tflearn/objectives.py:66: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2020-03-05 08:38:29.950471: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Number of active compounds : 78
Number of inactive compounds : 117
Number of active test compounds : 20
Number of inactive test compounds : 30
(266, 266)
(266, 266)
(266, 266)
(266, 266)
...
OK, after some struggling, I set scale=200/266
in svg2png
in dataprocessing.py
, and the shape of the png files becomes 200X200, and the training is now in process.
While I can train models now, I also want to predict the activity of some SMILES, I run the command
python loadDEEPScreenModel.py CHEMBL286 CNNModel_CHEMBL286_adam_0.0005_15_256_0.6_True-525 sample_test_compound_file.txt
but was informed
Traceback (most recent call last):
File "loadDEEPScreenModel.py", line 88, in <module>
loadModel(chembl_target, model_fl)
File "loadDEEPScreenModel.py", line 47, in loadModel
chembl_target_threshold_dict = getModelThresholds("deepscreen_models_hyperparameters_performance_results.tsv")
File "/home/ai_robot/data/DEEPScreen/bin/dataProcessing.py", line 1172, in getModelThresholds
log_fl, modelname, target, optimizer, learning_rate, epoch, hidden1, hidden2, dropout, rotate, save_model, test_f1score, test_mcc, test_accuracy, test_precision, test_recall, test_tp, test_fp, test_tn, test_fn, test_threshold, val_auc, val_auprc, test_auc, test_auprc = line.split("\t")
ValueError: not enough values to unpack (expected 25, got 20)
I know this is due to the data structure: getModelThresholds
function is reading and extracting information from resultFiles/deepscreen_models_hyperparameters_performance_results.tsv
, however, getModelThresholds
is searching for 25 columns while that file only contains 20 colunmns. I looked at that file, and found the following 5 columns are not included in the .tsv file:
test_tp
test_fp
test_tn
test_fn
test_threshold
I cannot just remove these 5 columns in the getModelThresholds
because it returns a value that is indeed test_threshold
. I can imagine that if these column names are removed from dataProcessing.py
, there will be numerous errors. Could you please provide the file with the 25 columns, or please let me know how to generate these files after a training? Thank you vety much.
Sorry for the late reply. First of all, considering the image size issue:
We still could not figure out what is causing this problem. The systems we have tested DEEPScreen is MacOS (10.12 or newer) and Linux Ubuntu (14.04). If you have chance to try on one of these OS we can have a better idea. Sorry that we could not solve it on our end.
Considering rescaling, since the compound images will be different from natively generated 200x200 images, there may be some performance differences. We have covered similar issues, by doing a few tests, in our manuscript (in supplementary material).
Second, about the problem of columns in resultFiles/deepscreen_models_hyperparameters_performance_results.tsv, thank you for letting us know about this problem, this is due to an update in our repository. We are working on constructing the file with the correct number of columns (including the necessary information in the table), and we will upload the correct file once the process is finished.
Sorry for the late reply. First of all, considering the image size issue:
We still could not figure out what is causing this problem. The systems we have tested DEEPScreen is MacOS (10.12 or newer) and Linux Ubuntu (14.04). If you have chance to try on one of these OS we can have a better idea. Sorry that we could not solve it on our end.
Considering rescaling, since the compound images will be different from natively generated 200x200 images, there may be some performance differences. We have covered similar issues, by doing a few tests, in our manuscript (in supplementary material).
Second, about the problem of columns in resultFiles/deepscreen_models_hyperparameters_performance_results.tsv, thank you for letting us know about this problem, this is due to an update in our repository. We are working on constructing the file with the correct number of columns (including the necessary information in the table), and we will upload the correct file once the process is finished.
Thank you so much. I'll try to test this on other OS. Looking forward to seeing your updated files.
I have the same issue reported here on Ubuntu 18.04. You can see that with the original code the images that get generated have the following specification: CHEMBL288346.svg SVG 250x250 250x250+0+0 16-bit sRGB 21.4KB 0.000u 0:00.009
I have the same issue reported here on Ubuntu 18.04. You can see that with the original code the images that get generated have the following specification: CHEMBL288346.svg SVG 250x250 250x250+0+0 16-bit sRGB 21.4KB 0.000u 0:00.009
Thank you for your interest. We could not reproduce this error no matter what we tried. Would it be possible for you to try it with the given tool/library versions in our repository:
Python 3.5.2 Tensorflow 1.12.0 Tflearn 0.3.2 Sklearn 0.19.2 Numpy 1.14.5 CairoSVG 2.1.2 RDkit 2016.09.4 OpenCV 3.3.0
Anaconda finds those combinations incompatible with each other so that's a no go. I did change lines 219 to 221 in bin/dataProcessing.py to the below code and it works
Draw.MolToFile(mol, "{}/{}.svg".format(output_path,id), size= ( 160 , 160 )) cairosvg.svg2png(url='{}/{}.svg'.format(output_path,id), write_to="{}/{}.png".format(output_path,id), output_width=200, output_height=200)
I fully realise this doesn't make sense (the 160 bit), but it works.
That sounds like a suitable quick-fix.
Also, we plan to be working on fixing all these errors. We hope that it will be ready to be deployed soon.
That sounds like a suitable quick-fix.
Also, we plan to be working on fixing all these errors. We hope that it will be ready to be deployed soon.
Glad to know that you are currently working on these issues, and looking forward to seeing the release.
Respected Sir, I am attempting to reproduce the results in your paper but there is an error shown below. Even I am using the python and libraries versions the same as you described.
import cairocffi as cairo
File "C:\Users\Haseeb Younas\AppData\Local\Programs\Python\Python35\lib\site-packages\cairocffi__init.py", line 50, in
would you please help me to solve this problem?
Respected Sir, I am attempting to reproduce the results in your paper but there is an error shown below. Even I am using the python and libraries versions the same as you described.
import cairocffi as cairo
File "C:\Users\Haseeb Younas\AppData\Local\Programs\Python\Python35\lib\site-packages\cairocffiinit.py", line 50, in ('libcairo.so', 'libcairo.2.dylib', 'libcairo-2.dll')) File "C:\Users\Haseeb Younas\AppData\Local\Programs\Python\Python35\lib\site-packages\cairocffiinit.py", line 45, in dlopen raise OSError(error_message) # pragma: no cover OSError: no library called "cairo" was found no library called "libcairo-2" was found cannot load library 'libcairo.so': error 0x7e cannot load library 'libcairo.2.dylib': error 0x7e cannot load library 'libcairo-2.dll': error 0x7e
would you please help me to solve this problem?
It seems that you are running this on a Windows computer. I don't think that these libraries are compatible with Windows OS. If you could try these on a Linux or MacOS device, they should work.
Sir, thanks for the quick reply. please let me check on Linux than I'll let you know .
That sounds like a suitable quick-fix.
Also, we plan to be working on fixing all these errors. We hope that it will be ready to be deployed soon.
@bellstwohearted thanks for the support it worked on the Linux but I am getting the same error that others are facing in this thread. @tuncadogan Sir, May you please tell me how much time It will take to solve these errors.
That sounds like a suitable quick-fix. Also, we plan to be working on fixing all these errors. We hope that it will be ready to be deployed soon.
@bellstwohearted thanks for the support it worked on the Linux but I am getting the same error that others are facing in this thread. @tuncadogan Sir, May you please tell me how much time It will take to solve these errors.
We anticipate to finish it just around 2 or 2 and a half weeks. But if you are in a hurry, you can follow the steps in our readme to train a DEEPScreen classifier for your target protein of interest.
@bellstwohearted thank you for your help to @HaseebYounis2
That sounds like a suitable quick-fix. Also, we plan to be working on fixing all these errors. We hope that it will be ready to be deployed soon.
@bellstwohearted thanks for the support it worked on the Linux but I am getting the same error that others are facing in this thread. @tuncadogan Sir, May you please tell me how much time It will take to solve these errors.
We anticipate to finish it just around 2 or 2 and a half weeks. But if you are in a hurry, you can follow the steps in our readme to train a DEEPScreen classifier for your target protein of interest.
@bellstwohearted thank you for your help to @HaseebYounis2
I also have passed through the training process and it is also generating the same error as other people have reported. So, for now I am moving toward the data preprocessing and curation step of this paper. I'll wait for the updated data from your side. Thanks a lot.
Hi, We are sorry for the delay. We are trying our best to update the system. We had to made some major changes to create a new version. We are planning to put the new implementation until this Friday. I will give an update once we finish the initial development and release the code. Best
Hi,
We are sorry for the late response again. It is quite busy and hectic times for us and we had to do some major changes in the implementation of DEEPScreen as I mentioned before. The main change is that we decided not to proceed with the tflearn as the version that we had used became too old (it has been almost 4 years since we started this project) and we encountered other problems and incompatibilities among the new versions of libraries when we want to do some changes. Some others also reported installation problems.
For these reasons, DEEPScreen has been re-implemented using PyTorch. We created all the training/test/validation images for all targets in order to avoid the image size, quality and library issues. So, you can use the readily available images to train models for the targets. The new version has been tested on MacOSx and Linux. Unfortunately, we have not yet been able to work on CNN architectures in detail and create models for each target as it is required to perform hyper-parameter search for all of the targets separately. But we are planning to work on it next.
Here is the summary of the new changes: The implementation was done using the latest version of all libraries (PyTorch, RDkit etc.) The filtered and preprocessed dataset was updated using ChEMBL version 27. The number of targets increased from 704 to 812 with the updated training datasets. Training, validation and test images were created for each target.
Here is the things that we are planning to do next: Adding other CNN architectures will be added such as InceptionV3 for training Performing hyperparameter search and generating target-specific models Developing scripts for easy testing using the generated models
I am closing this issue now. Please let us any problems that you encounter.
Best
@ahmetrifaioglu Thank you for the updates. When I am trying to run the example you give, I get an error, saying
Namespace(bs=64, dropout=0.25, en='my_chembl286_training', epoch=100, fc1=256, fc2=128, lr=0.01, model='CNNModel1', targetid='CHEMBL286')
Arguments: CHEMBL286-CNNModel1-256-128-0.01-64-0.25-100-my_chembl286_training
GPU is available on this device!
Epoch :0
Training mode: True
Epoch 0 training loss: 23.89242261648178
Traceback (most recent call last):
File "main_training.py", line 67, in <module>
args.dropout, args.epoch, args.en)
File "/data/ai_robot/DEEPScreen/bin/train_deepscreen.py", line 143, in train_validation_test_training
training_perf_dict = prec_rec_f1_acc_mcc(all_training_labels, np.array(all_training_preds))
File "/data/ai_robot/DEEPScreen/bin/evaluation_metrics.py", line 10, in prec_rec_f1_acc_mcc
precision = metrics.precision_score(y_true, y_pred)
File "/data/ai_robot/Anaconda3/envs/deepscreen/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
return f(**kwargs)
File "/data/ai_robot/Anaconda3/envs/deepscreen/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 1623, in precision_score
zero_division=zero_division)
File "/data/ai_robot/Anaconda3/envs/deepscreen/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
return f(**kwargs)
File "/data/ai_robot/Anaconda3/envs/deepscreen/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 1434, in precision_recall_fscore_support
pos_label)
File "/data/ai_robot/Anaconda3/envs/deepscreen/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 1250, in _check_set_wise_labels
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "/data/ai_robot/Anaconda3/envs/deepscreen/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 98, in _check_targets
raise ValueError("{0} is not supported".format(y_type))
ValueError: unknown is not supported
It seems that some data type errors occur at
training_perf_dict = prec_rec_f1_acc_mcc(all_training_labels, np.array(all_training_preds))
Any suggestions?
@bellstwohearted We tried to reproduce the error on four different machines having Linux and MacOSx operating systems with multiple trials but we could not reproduce the error. I also searched the error and I could not find a clear answer. Are you using the same versions of libraries? If so, the only thing that comes to my mind is that there were no true predictions at epoch 1. I have now added exception handling for the performance calculation. This should resolve the error, if the error occurred due to no true predictions in the first epochs.
I got the same error: X_data = X_data.reshape(203,10,200, 200,3) ValueError: cannot reshape array of size 398400000 into shape (203,10,200,200,3) Please help!
We have a new version coded in pytorch, in this version we discard the whole image generation module due to these errors. We directly feed the system with pre-generated images. I believe you are using the old version of our tool. Please switch to pytorch branch and follow instructions on training/testing a model. Let me know if you have further questions.
I am attempting to reproduce the results in your paper and then train models on my own dataset, but several models failed to train, saying "ValueError: cannot reshape array"
Any idea on how to fix this??