AIWintermuteAI / aXeleRate

Keras-based framework for AI on the Edge
MIT License
176 stars 71 forks source link

Training is not using GPU #16

Closed pritamghanghas closed 4 years ago

pritamghanghas commented 4 years ago

It seems CUDA_VISIBLE_DEVICES env variable has no effect on this training speed.

AIWintermuteAI commented 4 years ago

Can you provide more details? Are you running aXeleRate in Colab or local computer? If local computer: native installation, virtual env or conda?

pritamghanghas commented 4 years ago

I am running on local computer with conda. I have another conda environment with tf2.2. There GPU works perfectly fine. aXcelerate seems to be doing something, I used tensorflow_gpu 1.15 but still no luck. Next I will see how much effort is there to switch to tf 2.2 while still using aXcelerate. Will try to run it on colab as well. I generally prefer local, but whatever gets the job done is good by me.

AIWintermuteAI commented 4 years ago

If you do port it to tf 2.2, make a PR :) not complicated, but that is quite a lot of work though. Try making clean environment, then install aXeleRate with pip there and after that do conda install tensorflow-gpu==1.15 - that will make sure you have CUDA installed in conda environment you've activated. aXeleRate by itself doesn't influence GPU/CUDA usage. But from what I understand, tensorflow installed in conda environment needs CUDA installed with conda install command and will not use system-wide CUDA.

AIWintermuteAI commented 4 years ago

Try this git clone https://github.com/AIWintermuteAI/aXeleRate.git cd aXeleRate python setup.py install conda install tensorflow-gpu==1.15 keras

I just did this with new clean conda environment - after python setup.py install I wasn't able to use GPU, but when I did conda install tensorflow-gpu==1.15 keras it automatically installed necessary CUDA/CUDNN packages in the environment and GPU acceleration was back.

If this solution works for you, please do give feedback :)

pritamghanghas commented 4 years ago

That was strange. I had installed the same version using pip. But an install with conda of same package solved the probelm. Thanks.