avisingh599 / reward-learning-rl

[RSS 2019] End-to-End Robotic Reinforcement Learning without Reward Engineering
https://sites.google.com/view/reward-learning-rl/
Other
367 stars 68 forks source link

failed call to cuInit #15

Closed seivazi closed 5 years ago

seivazi commented 5 years ago

I have problem running the code on GPU. I'm quit sure I have GPU running in my system. The issue is the same for both docker and conda version. Do I miss something? When I run debug the GPU works: softlearning run_example_debug examples.classifier_rl --n_goal_examples 10 --task=Image48SawyerDoorPullHookEnv-v0 --algorithm VICERAQ --num-samples 1 --n_epochs 50 --active_query_frequency 10 When I run local only CPU works: softlearning run_example_local examples.classifier_rl --n_goal_examples 10 --task=Image48SawyerDoorPullHookEnv-v0 --algorithm VICERAQ --num-samples 1 --n_epochs 50 --active_query_frequency 10

Error: (pid=3343) 2019-08-13 11:02:51.856630: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

avisingh599 commented 5 years ago

Hmm, I haven't seen this failed call to culnit error message before. What GPU and GPU driver are you using?

Also, could you try adding the flags --trial-gpus 0.5 --trial-cpus 3 to see if it makes a difference?

seivazi commented 5 years ago

Thank you! It helped and error is gone. I have RTX 2070, driver 418.88 and Cuda 10.

Not related to this post: How do I visualize the results in the ray folder for e.g. Image48SawyerPushForwardEnv-v0?

avisingh599 commented 5 years ago

Try running your code with the additional flag --video-save-frequency=1, and you should be able to see a videos folder in your experiment logs.