aspuru-guzik-group / chemical_vae

Code for 10.1021/acscentsci.7b00572, now running on Keras 2.0 and Tensorflow
Apache License 2.0
479 stars 178 forks source link

train_vae not picking up GPU? #3

Closed spadavec closed 6 years ago

spadavec commented 6 years ago

While running the train_vae script, apparently my GPU isn't being used (the CPU usage is 300%+, but the GPU seems to be unused). My keras.json file specifies that the backend is tensorflow, and the KERAS_BACKEND env variable is also set to tensorflow. Is there something else I can do to use my GPU for training?

beangoben commented 6 years ago

Hi Spadavec, our implementation is based on Keras + Tensorflow. If you are not getting your GPU used is is probably because, one of these two is not recognizing your GPU.

Have you verified that tensorflow/keras is using your GPU in other settings?(https://www.tensorflow.org/programmers_guide/using_gpu)

spadavec commented 6 years ago

@beangoben yes, Keras + TF work on my GPU for other codebases. TF seems to sugget that it can see my GPU as well:

>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

2018-02-14 16:48:53.514406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0
2018-02-14 16:48:53.515305: I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0

Keras also is setup for TF:

 cat $HOME/.keras/keras.json
{
    "epsilon": 1e-07, 
    "floatx": "float32", 
    "image_data_format": "channels_last", 
    "backend": "tensorflow",
    "device": "gpu0" 
}

Although it seems that there is a potential version mismatch:

python -c 'import keras; print(keras.__version__)'
/home/spadavec/miniconda2/envs/chemvae/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
/home/spadavec/miniconda2/envs/chemvae/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
2.0.7

I installed everything via Anaconda; is it possible the environment.yml file isn't up-to-date?

beangoben commented 6 years ago

The environment.yml specifies keras=2.0.6, which is believe you are using 2.0.7..don't know if that is affecting anything. Could you could try importing tensorflow, keras (check on the gpu) and then chem_vae and see if that works?

spadavec commented 6 years ago

@beangoben sorry for the confusion, but you want me to do the following (this from the zinc directory in the repo)?

(chemvae) spadavec@turing:~/chemical_vae/models/zinc$ python
Python 3.6.4 | packaged by conda-forge | (default, Dec 23 2017, 16:31:06) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
/home/spadavec/miniconda2/envs/chemvae/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
im>>> import keras
/home/spadavec/miniconda2/envs/chemvae/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
>>> import chemvae
>>> from chemvae import train_vae
>>> 

my understanding is that this should launch the train_vae code, but it doesn't

spadavec commented 6 years ago

I just did a 'clean' install on a new machine without using conda install, and it seems to work now. I'll leave this issue open for now, and once I figure out what the conflict is, I'll post here and close out. If you want to close this out now, I understand!

beangoben commented 6 years ago

great! Will close if there is no additional related questions.