ATOMScience-org / AMPL

The ATOM Modeling PipeLine (AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.
MIT License
136 stars 67 forks source link

Errors encountered with predict_bsep_inhibition.py example workflow #34

Closed GodloveD closed 3 years ago

GodloveD commented 3 years ago

commit 1eb5c65f0c6

I'm trying to work through the example here, but I'm running into errors.

$ ./predict_bsep_inhibition.py -i data/small_test_data.csv -o small_test_output.csv --id_col compound_id --smiles_col base_rdkit_smiles --activity_col active
/opt/conda/envs/atomsci/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/opt/conda/envs/atomsci/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/opt/conda/envs/atomsci/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Standardizing SMILES strings for 19 compounds.
Traceback (most recent call last):
  File "./predict_bsep_inhibition.py", line 147, in <module>
    main()
  File "./predict_bsep_inhibition.py", line 142, in main
    predict_activity(args)
  File "./predict_bsep_inhibition.py", line 62, in predict_activity
    pipe = mp.create_prediction_pipeline_from_file(pred_params, reload_dir=None, model_path=model_tarfile)
  File "/opt/conda/envs/atomsci/lib/python3.6/site-packages/atomsci/ddm/pipeline/model_pipeline.py", line 1115, in create_prediction_pipeline_from_file
    model_fp = tarfile.open(model_path, mode='r:gz')
  File "/opt/conda/envs/atomsci/lib/python3.6/tarfile.py", line 1587, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/opt/conda/envs/atomsci/lib/python3.6/tarfile.py", line 1634, in gzopen
    fileobj = gzip.GzipFile(name, mode + "b", compresslevel, fileobj)
  File "/opt/conda/envs/atomsci/lib/python3.6/gzip.py", line 163, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/opt/conda/envs/atomsci/lib/python3.6/site-packages/atomsci/ddm/examples/BSEP/models/bsep_classif_scaffold_split.tar.gz'

The examples directory does not exist within the conda environment. It is not installed there using the procedure outlined on the main README.md. If I manually copy the examples directory to the indicted subdirectory within the conda environment, I can progress past that error, but I encounter a new one.

$ ./predict_bsep_inhibition.py -i data/small_test_data.csv -o small_test_output.csv --id_col compound_id --smiles_col base_rdkit_smiles --activity_col active
/opt/conda/envs/atomsci/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/opt/conda/envs/atomsci/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/opt/conda/envs/atomsci/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Standardizing SMILES strings for 19 compounds.
2021-01-21 16:56:10,394 ['time_built', 'dataset_metadata', 'training_metrics', 'time_generated', 'best_epoch'] are not part of the accepted list of parameters and will be ignored
Reading descriptor spec table from /opt/conda/envs/atomsci/lib/python3.6/site-packages/atomsci/ddm/data/descriptor_sets_sources_by_descr_type.csv
Featurization = DescriptorFeaturization with mordred_filtered descriptors
Traceback (most recent call last):
  File "./predict_bsep_inhibition.py", line 147, in <module>
    main()
  File "./predict_bsep_inhibition.py", line 142, in main
    predict_activity(args)
  File "./predict_bsep_inhibition.py", line 62, in predict_activity
    pipe = mp.create_prediction_pipeline_from_file(pred_params, reload_dir=None, model_path=model_tarfile)
  File "/opt/conda/envs/atomsci/lib/python3.6/site-packages/atomsci/ddm/pipeline/model_pipeline.py", line 1165, in create_prediction_pipeline_from_file
    pipeline = ModelPipeline(model_params)
  File "/opt/conda/envs/atomsci/lib/python3.6/site-packages/atomsci/ddm/pipeline/model_pipeline.py", line 155, in __init__
    '%s_model_%s.tar.gz' % (self.params.dataset_name, self.params.model_uuid))
  File "/opt/conda/envs/atomsci/lib/python3.6/posixpath.py", line 80, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

It looks like there is a null string where it expects to see a path? Am I missing an input? Something else? Thanks!

das046 commented 3 years ago

Hi @GodloveD ,

Sorry for the late response. We have fixed the two problems. Can you give it a try again?

Thank you.

GodloveD commented 3 years ago

Thanks @das046! I think this is performing as expected now.

$ ./predict_bsep_inhibition.py -i data/small_test_data.csv -o small_test_output.csv --id_col compound_id --smiles_col base_rdkit_smiles --activity_col active
/bin/bash: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
/opt/conda/envs/atomsci/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/opt/conda/envs/atomsci/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/opt/conda/envs/atomsci/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Standardizing SMILES strings for 19 compounds.
2021-02-10 13:00:08,938 ['time_built', 'dataset_metadata', 'training_metrics', 'time_generated', 'best_epoch'] are not part of the accepted list of parameters and will be ignored
Reading descriptor spec table from /opt/conda/envs/atomsci/lib/python3.6/site-packages/atomsci/ddm/data/descriptor_sets_sources_by_descr_type.csv
Featurization = DescriptorFeaturization with mordred_filtered descriptors
/opt/conda/envs/atomsci/lib/python3.6/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
number of features: 1555
/opt/conda/envs/atomsci/lib/python3.6/site-packages/deepchem/trans/transformers.py:148: RuntimeWarning: invalid value encountered in true_divide
  X = np.nan_to_num((X - self.X_means) / self.X_stds)
TIMING: dataset construction took 0.016 s
Loading dataset from disk.
Wrote predictions to file small_test_output.csv
Performance metrics:

13 out of 19 predictions correct.
Accuracy: 0.684
Precision: 0.400
Recall: 0.400
NPV: 0.786
ROC AUC: 0.743
PRC AUC: 0.549
Matthews correlation coefficient: 0.186
Confusion matrix:
                predicted activity
actual
activity        0       1

   0            11      3
   1            3       2