DeepPINCS on CPU and GPU fails

nturaga commented 2 years ago

Hi Dongmin,

I’m part of the Bioconductor Core team, and we’ve been taking a look at your package.

Your package has an issue, where the vignette is not fully run.

Essentially in your vignette, the DeepPINCS::fit_cpi() function fails, leading to the error as shown in this gist https://gist.github.com/nturaga/311c05d989208e4a363d661c9c18e29c

This happens on both CPU and GPUs. Essentially the only reason your package passes on the Bioconductor build system is because that ‘if’ statement at the beginning of the model fit results to ‘FALSE’

> if (keras::is_keras_available() & reticulate::py_available()) {

This shouldn’t be used if you actually want to run the vignette. Please fix the error shown in the gist.

Best,

Nitesh

dongminjung commented 2 years ago

Hi Nitesh.

The package keras is a useful package for DL in R. However, as you know, a running version of python is required because the native versions of tensorflow and keras are written in python and accessed by R via the reticulate package. Thus, by using keras::is_keras_available() and reticulate::py_available(), we need to check that they are available in the current system environment. If they are not available in R, please investigate your system environment. Thanks.

Dongmin

nturaga commented 2 years ago

They are available on my machine. The fit_cpi function fails because of an internal issue within the function,

Run within the R console:

Note : The if condition passes and it fails at fitting model... step.

> if (keras::is_keras_available() & reticulate::py_available()) {
    compound_max_atoms <- 50
    protein_embedding_dim <- 16
    protein_length_seq <- 100
    gcn_cnn_cpi <- fit_cpi(
        smiles = example_cpi[train_idx, 1],
        AAseq = example_cpi[train_idx, 2], 
        outcome = example_cpi[train_idx, 3],
        compound_type = "graph",
        compound_max_atoms = compound_max_atoms,
        protein_length_seq = protein_length_seq,
        protein_embedding_dim = protein_embedding_dim,
        protein_ngram_max = 2,
        protein_ngram_min = 1,
        smiles_val = example_cpi[!train_idx, 1],
        AAseq_val = example_cpi[!train_idx, 2],
        outcome_val = example_cpi[!train_idx, 3],
        net_args = net_args,
        epochs = 20,
        batch_size = 64,
        callbacks = keras::callback_early_stopping(
            monitor = "val_accuracy",
            patience = 10,
            restore_best_weights = TRUE))
    ttgsea::plot_model(gcn_cnn_cpi$model)
}

checking sequences...

preprocessing for compounds...

preprocessing for proteins...

fitting model...

Error in py_call_impl(callable, dots$args, dots$keywords): RuntimeError: Evaluation error: AttributeError: __module__

Detailed traceback:
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 530, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/keras/engine/base_layer.py", line 315, in __init__
    self._instrument_layer_creation()
  File "/opt/conda/lib/python3.7/site-packages/keras/engine/base_layer.py", line 300, in _instrument_layer_creation
    keras_layers_gauge.get_cell(self._get_cell_name()).set(True)
  File "/opt/conda/lib/python3.7/site-packages/keras/engine/base_layer.py", line 287, in _get_cell_name
    return self.__class__.__module__ + '.' + self.__class__.__name__
.

Detailed traceback:
  File "/home/jupyter/packages/reticulate/python/rpytools/call.py", line 21, in python_function
    raise RuntimeError(res[kErrorKey])

Traceback:

1. fit_cpi(smiles = example_cpi[train_idx, 1], AAseq = example_cpi[train_idx, 
 .     2], outcome = example_cpi[train_idx, 3], compound_type = "graph", 
 .     compound_max_atoms = compound_max_atoms, protein_length_seq = protein_length_seq, 
 .     protein_embedding_dim = protein_embedding_dim, protein_ngram_max = 2, 
 .     protein_ngram_min = 1, smiles_val = example_cpi[!train_idx, 
 .         1], AAseq_val = example_cpi[!train_idx, 2], outcome_val = example_cpi[!train_idx, 
 .         3], net_args = net_args, epochs = 20, batch_size = 64, 
 .     callbacks = keras::callback_early_stopping(monitor = "val_accuracy", 
 .         patience = 10, restore_best_weights = TRUE))   # at line 5-24 of file <text>
2. do.call(net_args$compound, net_args$compound_args)
3. gcn_in_out(gcn_units = c(128, 64), gcn_activation = c("relu", 
 . "relu"), fc_units = 10, fc_activation = "relu", max_atoms = 50, 
 .     feature_dim = 24L)
4. x %>% layer_multi_linear(units = temp_units) %>% keras::layer_activation(activation = gcn_activation[i])
5. keras::layer_activation(., activation = gcn_activation[i])
6. create_layer(keras$layers$Activation, object, list(activation = activation, 
 .     input_shape = normalize_shape(input_shape), batch_input_shape = normalize_shape(batch_input_shape), 
 .     batch_size = as_nullable_integer(batch_size), dtype = dtype, 
 .     name = name, trainable = trainable, weights = weights))
7. layer_multi_linear(., units = temp_units)
8. create_layer(layer, object, .args)
9. do.call(layer_class, args)
10. (structure(function (...) 
  . {
  .     dots <- py_resolve_dots(list(...))
  .     result <- py_call_impl(callable, dots$args, dots$keywords)
  .     if (convert) 
  .         result <- py_to_r(result)
  .     if (is.null(result)) 
  .         invisible(result)
  .     else result
  . }, class = c("python.builtin.type", "python.builtin.object"), py_object = <environment>))(units = temp_units)
11. py_call_impl(callable, dots$args, dots$keywords)

dongminjung commented 2 years ago

It works on my Windows machine. The version information of python packages is as follows.

keras 2.4.3
tensorflow 2.4.0
python 3.8.7

For fit_cpi, I didn't run the latest version of those python packages. But, I recommend python 3.8 or higher.

nturaga commented 2 years ago

These are the versions I'm using.

keras==2.8.0
tensorflow==2.8.0
Python 3.9.9

But, where do you specify the versions of these packages in your DESCRIPTION file or your R package? Also, how is a user supposed to know which versions to use specifically?

I'm inexperienced in this domain, and I've not used reticulate much. Python has a requirements.txt file generally which goes with its packages. Maybe it's worth adding the requirements.txt in the inst/extdata field.

Also, can you confirm that this package is not compatible with the versions I mentioned?? And see if it's an issue within python? I can do the opposite and see if your fit_cpi function works with the package versions you mentioned.

nturaga commented 2 years ago

Hi @dongminjung,

After testing your package a little bit more, it seems the criteria for running it are very stringent and they work only with python 3.8 and the versions of Keras and TensorFlow you mentioned.

I've created a test docker image based on bioconductor/bioconductor_docker:devel to make sure it's all reproducible.

The only issue I've had is, I'm unable to run it on Python 3.9 and the latest versions of Keras and TensorFlow. Please mention within your package README the specific versions. Do you have any plans to have your package work with the latest versions of Python / Tensorflow and Keras?

The docker image is nitesh1989/bioconductor_docker:deeppincs_RELEASE_3_15.

$ python3 --version
Python 3.8.10

$ cat requirements.txt
absl-py==0.15.0
astunparse==1.6.3
cachetools==5.0.0
certifi==2021.10.8
charset-normalizer==2.0.12
flatbuffers==1.12
gast==0.3.3
google-auth==2.6.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.32.0
h5py==2.10.0
idna==3.3
importlib-metadata==4.11.1
joblib==1.1.0
Keras==2.4.3
Keras-Preprocessing==1.1.2
Markdown==3.3.6
numpy==1.19.5
oauthlib==3.2.0
opt-einsum==3.3.0
pandas==1.4.0
protobuf==3.19.4
pyasn1==0.4.8
pyasn1-modules==0.2.8
python-dateutil==2.8.2
pytz==2021.3
PyYAML==6.0
requests==2.27.1
requests-oauthlib==1.3.1
rsa==4.8
scikit-learn==1.0.2
scipy==1.8.0
six==1.15.0
sklearn==0.0
tensorboard==2.8.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.4.0
tensorflow-estimator==2.4.0
termcolor==1.1.0
threadpoolctl==3.1.0
typing-extensions==3.7.4.3
urllib3==1.26.8
Werkzeug==2.0.3
wrapt==1.12.1
zipp==3.7.0

dongminjung commented 2 years ago

Hi Nitesh.

Ok. I'll add the versions of packages I used to the README file. To keep up with changes, I'll update DeepPINCS with latest versions of Tensorflow, Keras packages and Python. I'll let you know if it will be done. Thank you for your help.

dongminjung commented 2 years ago

Hi @nturaga

I updated the development version of DeepPINCS (1.3.8) for the latest versions of Tensorflow, Keras packages and Python. Also, the versions of these packages are mentioned in the README file. Please find them. Thanks.

dongminjung / DeepPINCS

DeepPINCS on CPU and GPU fails #2