Describe the bug
It worked fine with CPU but when I switched to GPU, the following error shown many times
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Expected behavior
A clear and concise description of what you expected to happen.
System (please complete the following information):
Commit: 799ded43ed6fca725344f28eefc0fb97cb932e53
OS: Ubuntu 20.04
GPU: RTX4070
CUDA: 11.8
GPU Driver: 525.125.06
Additional Notice
The following showing is the whole output:
python3 legged_gym/scripts/train.py --task=a1_amp
Importing module 'gym_38' (/home/tianhu/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/tianhu/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
PyTorch version 2.0.0+cu118
Device count 1
/home/tianhu/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/tianhu/.cache/torch_extensions/py38_cu118 as PyTorch extensions root...
Emitting ninja build file /home/tianhu/.cache/torch_extensions/py38_cu118/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
Setting seed: 1
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
/home/tianhu/anaconda3/envs/amp_hw/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, *kwargs) # type: ignore[attr-defined]
Loaded 2.499s. motion from datasets/mocap_motions/rightturn0.txt.
Loaded 10.311s. motion from datasets/mocap_motions/pace1.txt.
Loaded 0.798s. motion from datasets/mocap_motions/pace0.txt.
Loaded 0.672s. motion from datasets/mocap_motions/trot1.txt.
Loaded 0.672s. motion from datasets/mocap_motions/trot0.txt.
Loaded 0.9450000000000001s. motion from datasets/mocap_motions/leftturn0.txt.
AMPOnPolicyRunner
Actor MLP: Sequential(
(0): Linear(in_features=42, out_features=512, bias=True)
(1): ELU(alpha=1.0)
(2): Linear(in_features=512, out_features=256, bias=True)
(3): ELU(alpha=1.0)
(4): Linear(in_features=256, out_features=128, bias=True)
(5): ELU(alpha=1.0)
(6): Linear(in_features=128, out_features=12, bias=True)
)
Critic MLP: Sequential(
(0): Linear(in_features=48, out_features=512, bias=True)
(1): ELU(alpha=1.0)
(2): Linear(in_features=512, out_features=256, bias=True)
(3): ELU(alpha=1.0)
(4): Linear(in_features=256, out_features=128, bias=True)
(5): ELU(alpha=1.0)
(6): Linear(in_features=128, out_features=1, bias=True)
)
Loaded 2.499s. motion from datasets/mocap_motions/rightturn0.txt.
Loaded 10.311s. motion from datasets/mocap_motions/pace1.txt.
Loaded 0.798s. motion from datasets/mocap_motions/pace0.txt.
Loaded 0.672s. motion from datasets/mocap_motions/trot1.txt.
Loaded 0.672s. motion from datasets/mocap_motions/trot0.txt.
Loaded 0.9450000000000001s. motion from datasets/mocap_motions/leftturn0.txt.
Preloading 2000000 transitions
Finished preloading
PxgCudaDeviceMemoryAllocator fail to allocate memory 339738624 bytes!! Result = 2
[Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 4210
[Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3480
[Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3535
[Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 6137
[Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysXCuda.cu: 991
Traceback (most recent call last):
File "legged_gym/scripts/train.py", line 47, in
train(args)
File "legged_gym/scripts/train.py", line 42, in train
ppo_runner, train_cfg = task_registry.make_alg_runner(env=env, name=args.task, args=args)
File "/home/tianhu/AMP_for_hardware/legged_gym/utils/task_registry.py", line 149, in make_alg_runner
runner = runner_class(env, train_cfg_dict, log_dir, device=args.rl_device)
File "/home/tianhu/AMP_for_hardware/rsl_rl/rsl_rl/runners/amp_on_policyrunner.py", line 104, in init, _ = self.env.reset()
File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 99, in reset
obs, privilegedobs, , , , , = self.step(torch.zeros(self.num_envs, self.num_actions, device=self.device, requires_grad=False))
File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 113, in step
self.torques = self._compute_torques(self.actions).view(self.torques.shape)
File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 431, in _compute_torques
actions_scaled = actions self.cfg.control.action_scale
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Not really sure if this is an issue with the code, an issue with your CUDA install, or an issue with this repo being out of date. If you find a fix please let me know and I will merge!
Describe the bug It worked fine with CPU but when I switched to GPU, the following error shown many times RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
System (please complete the following information):
Additional Notice
The following showing is the whole output:
python3 legged_gym/scripts/train.py --task=a1_amp Importing module 'gym_38' (/home/tianhu/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so) Setting GYM_USD_PLUG_INFO_PATH to /home/tianhu/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json PyTorch version 2.0.0+cu118 Device count 1 /home/tianhu/isaacgym/python/isaacgym/_bindings/src/gymtorch Using /home/tianhu/.cache/torch_extensions/py38_cu118 as PyTorch extensions root... Emitting ninja build file /home/tianhu/.cache/torch_extensions/py38_cu118/gymtorch/build.ninja... Building extension module gymtorch... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module gymtorch... Setting seed: 1 Not connected to PVD +++ Using GPU PhysX Physics Engine: PhysX Physics Device: cuda:0 GPU Pipeline: enabled /home/tianhu/anaconda3/envs/amp_hw/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, *kwargs) # type: ignore[attr-defined] Loaded 2.499s. motion from datasets/mocap_motions/rightturn0.txt. Loaded 10.311s. motion from datasets/mocap_motions/pace1.txt. Loaded 0.798s. motion from datasets/mocap_motions/pace0.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot1.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot0.txt. Loaded 0.9450000000000001s. motion from datasets/mocap_motions/leftturn0.txt. AMPOnPolicyRunner Actor MLP: Sequential( (0): Linear(in_features=42, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=256, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=256, out_features=128, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=128, out_features=12, bias=True) ) Critic MLP: Sequential( (0): Linear(in_features=48, out_features=512, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=512, out_features=256, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=256, out_features=128, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=128, out_features=1, bias=True) ) Loaded 2.499s. motion from datasets/mocap_motions/rightturn0.txt. Loaded 10.311s. motion from datasets/mocap_motions/pace1.txt. Loaded 0.798s. motion from datasets/mocap_motions/pace0.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot1.txt. Loaded 0.672s. motion from datasets/mocap_motions/trot0.txt. Loaded 0.9450000000000001s. motion from datasets/mocap_motions/leftturn0.txt. Preloading 2000000 transitions Finished preloading PxgCudaDeviceMemoryAllocator fail to allocate memory 339738624 bytes!! Result = 2 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 4210 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3480 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3535 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 6137 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysXCuda.cu: 991 Traceback (most recent call last): File "legged_gym/scripts/train.py", line 47, in
train(args)
File "legged_gym/scripts/train.py", line 42, in train
ppo_runner, train_cfg = task_registry.make_alg_runner(env=env, name=args.task, args=args)
File "/home/tianhu/AMP_for_hardware/legged_gym/utils/task_registry.py", line 149, in make_alg_runner
runner = runner_class(env, train_cfg_dict, log_dir, device=args.rl_device)
File "/home/tianhu/AMP_for_hardware/rsl_rl/rsl_rl/runners/amp_on_policyrunner.py", line 104, in init
, _ = self.env.reset()
File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 99, in reset
obs, privilegedobs, , , , , = self.step(torch.zeros(self.num_envs, self.num_actions, device=self.device, requires_grad=False))
File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 113, in step
self.torques = self._compute_torques(self.actions).view(self.torques.shape)
File "/home/tianhu/AMP_for_hardware/legged_gym/envs/base/legged_robot.py", line 431, in _compute_torques
actions_scaled = actions self.cfg.control.action_scale
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.