RuntimeError: CUDA error: an illegal memory access was encountered

Describe the bug It worked fine with CPU but when I switched to GPU, the following error shown many times RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Steps to reproduce the behavior:

Execute python3 legged_gym/scripts/train.py --task=a1_amp
See error: ...

Expected behavior A clear and concise description of what you expected to happen.

System (please complete the following information):

Commit: 799ded43ed6fca725344f28eefc0fb97cb932e53
OS: Ubuntu 20.04
GPU: RTX4070
CUDA: 11.8
GPU Driver: 525.125.06

Additional Notice

The following showing is the whole output:

python3 legged_gym/scripts/train.py --task=a1_amp Importing module 'gym_38' (/home/tianhu/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so) Setting GYM_USD_PLUG_INFO_PATH to /home/tianhu/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json PyTorch version 2.0.0+cu118 Device count 1 /home/tianhu/isaacgym/python/isaacgym/_bindings/src/gymtorch Using /home/tianhu/.cache/torch_extensions/py38_cu118 as PyTorch extensions root... Emitting ninja build file /home/tianhu/.cache/torch_extensions/py38_cu118/gymtorch/build.ninja... Building extension module gymtorch... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module gymtorch... Setting seed: 1 Not connected to PVD +++ Using GPU PhysX Physics Engine: PhysX Physics Device: cuda:0 GPU Pipeline: enabled /home/tianhu/anaconda3/envs/amp_hw/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, *kwargs) # type: ignore[attr-defined] Loaded 2.499s. motion from datasets/mocap_motions/rightturn0.txt. Loaded 10.311s. motion from datasets/mocap_motions/pace1.txt. Loaded 0.798s. motion from datasets/mocap_motions/pace0.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot1.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot0.txt. Loaded 0.9450000000000001s. motion from datasets/mocap_motions/leftturn0.txt. AMPOnPolicyRunner Actor MLP: Sequential( (0): Linear(in_features=42, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=256, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=256, out_features=128, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=128, out_features=12, bias=True) ) Critic MLP: Sequential( (0): Linear(in_features=48, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=256, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=256, out_features=128, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=128, out_features=1, bias=True) ) Loaded 2.499s. motion from datasets/mocap_motions/rightturn0.txt. Loaded 10.311s. motion from datasets/mocap_motions/pace1.txt. Loaded 0.798s. motion from datasets/mocap_motions/pace0.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot1.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot0.txt. Loaded 0.9450000000000001s. motion from datasets/mocap_motions/leftturn0.txt. Preloading 2000000 transitions Finished preloading PxgCudaDeviceMemoryAllocator fail to allocate memory 339738624 bytes!! Result = 2 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 4210 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3480 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3535 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 6137 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysXCuda.cu: 991 Traceback (most recent call last): File "legged_gym/scripts/train.py", line 47, in train(args) File "legged_gym/scripts/train.py", line 42, in train ppo_runner, train_cfg = task_registry.make_alg_runner(env=env, name=args.task, args=args) File "/home/tianhu/AMP_for_hardware/legged_gym/utils/task_registry.py", line 149, in make_alg_runner runner = runner_class(env, train_cfg_dict, log_dir, device=args.rl_device) File "/home/tianhu/AMP_for_hardware/rsl_rl/rsl_rl/runners/amp_on_policyrunner.py", line 104, in init , _ = self.env.reset() File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 99, in reset obs, privilegedobs, , , , , = self.step(torch.zeros(self.num_envs, self.num_actions, device=self.device, requires_grad=False)) File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 113, in step self.torques = self._compute_torques(self.actions).view(self.torques.shape) File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 431, in _compute_torques actions_scaled = actions self.cfg.control.action_scale RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Alescontrela / AMP_for_hardware

RuntimeError: CUDA error: an illegal memory access was encountered #11