deepgram / kur

Descriptive Deep Learning
Apache License 2.0
814 stars 107 forks source link

keras_backend.py disables GPU for theano, while tensorflow backend selected #65

Closed stovenator closed 7 years ago

stovenator commented 7 years ago

When calling kur with tensorflow backend, the keras_backend.py script will override the CUDA_VISIBLE_DEVICES environment variable if theano cannot use the GPU, even though tensorflow is correctly configured to use the GPU.

        if not self.devices:
                replace_theano_flag('device', 'cpu')
                env['CUDA_VISIBLE_DEVICES'] = '100'
                logger.info('Requesting CPU')

CUDA_VISIBLE_DEVICES shouldn't be overwritten in this fashion. There should be a check for which backend is used beforehand and only override if the selected backend can't access the GPU.

ajsyp commented 7 years ago

Yep, this is true. But does this break anything for you? After all, modifying the environment within Kur won't affect any parent processes.

The reason that I originally did it this way is because I don't want to duplicate Keras' code which selects a backend (which looks in ~/.keras), but at the same time I don't know of a stable Keras API which lets me query which backend it would select if I did import keras. Unfortunately, once import keras is executed, the Keras backend is already selected and cannot be modified. So if I want to 1.) only use the Keras public API, and 2.) want to make sure that Keras uses the correct GPUs no matter which backend it selects once import keras is executed, this seemed like the most natural way. A little overkill, sure, but pretty reliable.

I'm open to other techniques, if you can think of them, or can point out what breaks with the currently implementation.

stovenator commented 7 years ago

I have Tensorflow configured for use with GPU, but Theano is not.

When this code runs, it checks for Theano GPU capability. When it doesn't find it, the CUDA_VISIBLE_DEVICES is set to 100. Then, Keras starts up tensorflow, and tensorflow is unable to find my GPU.

So, yes, it breaks tensforflow's ability to use the GPU with kur.

ajsyp commented 7 years ago

This shouldn't cause any problems or break GPU with TensorFlow or Theano; in fact, I regularly use GPU with both TensorFlow and Theano.

That particular snippet of code does not check for Theano GPU capability. self.devices is just a list of GPU devices that Kur detected, and is an empty list if CPU is requested (or if no GPUs are found), and I am simply pre-emptively setting some environmental variables that might impact Theano if Theano gets loaded as the Keras backend.

My guess is that you are running into a problem somewhere in your GPU setup. What GPU(s) are you using? What does nvidia-smi report? And if you run Kur in trace-level verbosity (-vvv), what output does it produce?

ajsyp commented 7 years ago

If there is still something actionable here, let me know (and post the output of the commands above). Otherwise, I'm closing this for now.