Jiayuan-Gu / hab-mobile-manipulation

Mobile manipulation in Habitat
https://sites.google.com/view/hab-m3
62 stars 9 forks source link

CUDA error while running the evaluation script #8

Closed wzjscut closed 6 months ago

wzjscut commented 6 months ago

When I use the command about Evaluate a HAB (Home Assistant Benchmark) task from the README.md, all the commands show the same error:

[22:54:57:295573]:[Nav] PathFinder.cpp(386)::build : Building navmesh with 145 x 259 cells [22:54:57:307122]:[Nav] PathFinder.cpp(656)::build : Created navmesh with 86 vertices 40 polygons [22:54:57:307135]:[Sim] Simulator.cpp(920)::recomputeNavMesh : reconstruct navmesh successful Skill begin. Skill begin. Traceback (most recent call last): File "mobile_manipulation/eval_composite.py", line 328, in main() File "mobile_manipulation/eval_composite.py", line 220, in main step_action = policy.act(ob) File "/home/ubuntu/habitat/hab-mobile-manipulation/mobile_manipulation/methods/skill.py", line 127, in act action = self.current_skill.act(obs, kwargs) File "/home/ubuntu/habitat/hab-mobile-manipulation/mobile_manipulation/methods/skill.py", line 127, in act action = self.current_skill.act(obs, kwargs) File "/home/ubuntu/habitat/hab-mobile-manipulation/mobile_manipulation/methods/skills/rl_skills.py", line 67, in act outputs = self.actor_critic.act(step_batch, deterministic=True) File "/home/ubuntu/habitat/hab-mobile-manipulation/mobile_manipulation/ppo/policy.py", line 130, in act net_outputs: Dict[str, torch.Tensor] = self.net(batch) File "/home/ubuntu/anaconda3/envs/hab-mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, kwargs) File "/home/ubuntu/habitat/hab-mobile-manipulation/mobile_manipulation/ppo/policies/cnn_policy.py", line 211, in forward perception_embed = self.visual_encoder(cnn_input) File "/home/ubuntu/anaconda3/envs/hab-mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, *kwargs) File "/home/ubuntu/habitat/hab-mobile-manipulation/mobile_manipulation/ppo/policies/cnn_policy.py", line 46, in forward x = m(x) File "/home/ubuntu/anaconda3/envs/hab-mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, kwargs) File "/home/ubuntu/anaconda3/envs/hab-mm/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "/home/ubuntu/anaconda3/envs/hab-mm/lib/python3.7/site-packages/torch/nn/functional.py", line 1610, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

My device is RTX3090 and driver version is 535.154.05 and torch version : 1.5.1 and installed cuda version : 10.2 and CUDA Compute Capability: 8.6

Jiayuan-Gu commented 6 months ago

Hi @wzjscut, have you fixed the error?

cpezzato commented 2 months ago

I had the same error, I solved by removing and reinstalling pytorch within the environment for some reason. First pip uninstall torch followed by pip install torch.