Alescontrela / AMP_for_hardware

Code for "Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions"
Other
213 stars 53 forks source link

RuntimeError: CUDA error: an illegal memory access was encountered #11

Closed LeedsRamseyPeng closed 8 months ago

LeedsRamseyPeng commented 11 months ago

Describe the bug It worked fine with CPU but when I switched to GPU, the following error shown many times RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Steps to reproduce the behavior:

  1. Execute python3 legged_gym/scripts/train.py --task=a1_amp
  2. See error: ...

Expected behavior A clear and concise description of what you expected to happen.

System (please complete the following information):

Additional Notice

The following showing is the whole output:

python3 legged_gym/scripts/train.py --task=a1_amp Importing module 'gym_38' (/home/tianhu/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so) Setting GYM_USD_PLUG_INFO_PATH to /home/tianhu/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json PyTorch version 2.0.0+cu118 Device count 1 /home/tianhu/isaacgym/python/isaacgym/_bindings/src/gymtorch Using /home/tianhu/.cache/torch_extensions/py38_cu118 as PyTorch extensions root... Emitting ninja build file /home/tianhu/.cache/torch_extensions/py38_cu118/gymtorch/build.ninja... Building extension module gymtorch... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module gymtorch... Setting seed: 1 Not connected to PVD +++ Using GPU PhysX Physics Engine: PhysX Physics Device: cuda:0 GPU Pipeline: enabled /home/tianhu/anaconda3/envs/amp_hw/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, *kwargs) # type: ignore[attr-defined] Loaded 2.499s. motion from datasets/mocap_motions/rightturn0.txt. Loaded 10.311s. motion from datasets/mocap_motions/pace1.txt. Loaded 0.798s. motion from datasets/mocap_motions/pace0.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot1.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot0.txt. Loaded 0.9450000000000001s. motion from datasets/mocap_motions/leftturn0.txt. AMPOnPolicyRunner Actor MLP: Sequential( (0): Linear(in_features=42, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=256, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=256, out_features=128, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=128, out_features=12, bias=True) ) Critic MLP: Sequential( (0): Linear(in_features=48, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=256, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=256, out_features=128, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=128, out_features=1, bias=True) ) Loaded 2.499s. motion from datasets/mocap_motions/rightturn0.txt. Loaded 10.311s. motion from datasets/mocap_motions/pace1.txt. Loaded 0.798s. motion from datasets/mocap_motions/pace0.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot1.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot0.txt. Loaded 0.9450000000000001s. motion from datasets/mocap_motions/leftturn0.txt. Preloading 2000000 transitions Finished preloading PxgCudaDeviceMemoryAllocator fail to allocate memory 339738624 bytes!! Result = 2 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 4210 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3480 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3535 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 6137 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysXCuda.cu: 991 Traceback (most recent call last): File "legged_gym/scripts/train.py", line 47, in train(args) File "legged_gym/scripts/train.py", line 42, in train ppo_runner, train_cfg = task_registry.make_alg_runner(env=env, name=args.task, args=args) File "/home/tianhu/AMP_for_hardware/legged_gym/utils/task_registry.py", line 149, in make_alg_runner runner = runner_class(env, train_cfg_dict, log_dir, device=args.rl_device) File "/home/tianhu/AMP_for_hardware/rsl_rl/rsl_rl/runners/amp_on_policyrunner.py", line 104, in init , _ = self.env.reset() File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 99, in reset obs, privilegedobs, , , , , = self.step(torch.zeros(self.num_envs, self.num_actions, device=self.device, requires_grad=False)) File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 113, in step self.torques = self._compute_torques(self.actions).view(self.torques.shape) File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 431, in _compute_torques actions_scaled = actions self.cfg.control.action_scale RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Alescontrela commented 8 months ago

Not really sure if this is an issue with the code, an issue with your CUDA install, or an issue with this repo being out of date. If you find a fix please let me know and I will merge!