I use image_input size 256x256. The code runs for one iteration. In the first iteration, the losses are normally computed. But before the next step, I got CUDA warning: an illegal memory access was encountered Error. I guess this is caused by the array-out-of-bound issue. Do you have any idea?
train step 0; loss = 0.105516; scale_loss = 0.000002; rgb_loss = 0.105514; depth_normal_loss = 0.000000; supervise_normal_loss = 0.000000; non_local_loss = 0.000000; context = [[96, 121], [122, 147]]; bound = [0.5; 100.0]; scene = ['0f93fdb52c6933cf', 'a3a5e373d876db0e'];
......
File "/group/40034/ozhengchen/scene_gen/LGM/Gaussian_final/src/model/encoder/mamba2_model/mamba2/ssd_combined.py", line 565, in backward
dx, ddt, dA, dB, dC, dD, dz, ddt_bias, dinitial_states = _mamba_chunk_scan_combined_bwd(dout, x, dt, A, B, C, out, ctx.chunk_size, D=D, z=z, dt_bias=dt_bias, initial_states=initial_states, dfinal_states=dfinal_states, seq_idx=seq_i
dx, dt_softplus=ctx.dt_softplus, dt_limit=ctx.dt_limit)
File "/group/40034/ozhengchen/scene_gen/LGM/Gaussian_final/src/model/encoder/mamba2_model/mamba2/ssd_combined.py", line 458, in _mamba_chunk_scan_combined_bwd
ddA = _chunk_scan_bwd_ddAcs_stable(x, dt, dA_cumsum, dout, CB)
File "/group/40034/ozhengchen/scene_gen/LGM/Gaussian_final/src/model/encoder/mamba2_model/mamba2/ssd_chunk_scan.py", line 1655, in _chunk_scan_bwd_ddAcs_stable
_chunk_scan_bwd_ddAcs_stable_kernel[grid_ddtcs](
File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 100, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 100, in <dictcomp>
timings = {config: self._bench(*args, config=config, **kwargs)
File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 83, in _bench
return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8))
File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/triton/testing.py", line 105, in do_bench
torch.cuda.synchronize()
File "/group/40034/ozhengchen/anaconda3/envs/dg/lib/python3.10/site-packages/torch/cuda/__init__.py", line 783, in synchronize
return torch._C._cuda_synchronize()
RuntimeError: CUDA error: an illegal memory access was encountered
I use image_input size 256x256. The code runs for one iteration. In the first iteration, the losses are normally computed. But before the next step, I got
CUDA warning: an illegal memory access was encountered
Error. I guess this is caused by the array-out-of-bound issue. Do you have any idea?