Closed spencer03172023 closed 4 months ago
Could you verify that your data (projections and volume) are all on GPU 0 and are contiguous float32 arrays?
l on GPU 0 and are contiguous float32 arrays
Thanks for your reply, Kyle. I did check below in debug mode. data.dtype is torch.float32, data.is_contiguous is true and all data is on GPU0
Thanks for checking. Next thing to check is the order of the data. Some CT software packages store their projection data in "sinogram order" while LEAP stores it in "projection order". In LEAP the order of the projections data is (numAngles, numRows, numCols). Is this consistent with your code?
@spencer03172023 How about checking the memory-alignment. cudaMemcpy3D requires that the src and dst memory be aligned. The src or dst memory must therefore be allocated using cudaMallocPitch or cudaMalloc3D rather than cudaMalloc. Hope it helps you.
Thanks for checking. Next thing to check is the order of the data. Some CT software packages store their projection data in "sinogram order" while LEAP stores it in "projection order". In LEAP the order of the projections data is (numAngles, numRows, numCols). Is this consistent with your code?
Thanks, Kyle. I did check the sinogram data structure, and make it same as LEAP requirement. It can work now.
@spencer03172023 How about checking the memory-alignment. cudaMemcpy3D requires that the src and dst memory be aligned. The src or dst memory must therefore be allocated using cudaMallocPitch or cudaMalloc3D rather than cudaMalloc. Hope it helps you.
Thanks. Data array difference caused this. Thanks
Dear Kyle,
Thanks for your amazing work on CT data processing domain.
I tried to use your projector module as part of my NN for training as below. But with this, it will pop out illeage error as below "error part" show, i replace this projector with a different library, it works. Would you mind help to identify the root cause?
class projector(nn.Module): def init(self): super(projector, self).init() proj = Projector(forward_project=True, use_static=True, use_gpu=True, gpu_device=torch.device("cuda:0"), batch_size=1) numCols = 736 numAngles = 512 pixelSize = 1.2858 numRows = 1 proj.leapct.set_fanbeam(numAngles, numRows, numCols, pixelSize, pixelSize, 0.5(numRows-1), 0.5(numCols-1), proj.leapct.setAngleArray(numAngles, 360.0), 595, 1085.6) proj.leapct.set_volume(numCols, numCols, numRows, voxelWidth = 0.6641, voxelHeight=pixelSize) proj.leapct.set_flatDetector() proj.allocate_batch_data() proj.leapct.allocate_volume() self.pj = proj
ERROR PART: Loaded model weights from the checkpoint at /home/midea/ai/LEAP-1.16/lightning_logs/version_123/epoch=14-step=9000.ckpt Testing DataLoader 0: 0%| | 0/200 [00:00<?, ?it/s]cudaMemcpy3D Error: invalid argument kernel failed! error name: cudaErrorIllegalAddress error msg: an illegal memory access was encountered Traceback (most recent call last): File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 785, in _test_impl results = self._run(model, ckpt_path=ckpt_path) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 980, in _run results = self._run_stage() File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1016, in _run_stage return self._evaluation_loop.run() File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py", line 181, in _decorator return loop_run(self, *args, kwargs) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run self._evaluation_step(batch, batch_idx, dataloader_idx) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 376, in _evaluation_step output = call._call_strategy_hook(trainer, hook_name, step_kwargs.values()) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 294, in _call_strategy_hook output = fn(args, kwargs) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 403, in test_step return self.model.test_step(*args, kwargs) File "/home/midea/ai/LEAP-1.16/unrolling/Optimization_main.py", line 120, in test_step out = self(x, p) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/midea/ai/LEAP-1.16/unrolling/Optimization_main.py", line 70, in forward out = self.model(x, p) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/midea/ai/LEAP-1.16/unrolling/Optimization_recon.py", line 360, in forward x = module(x, proj) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/midea/ai/LEAP-1.16/unrolling/Optimization_recon.py", line 329, in forward tmp1 = self.block1(input_data, proj) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/midea/ai/LEAP-1.16/unrolling/Optimization_recon.py", line 314, in forward intervening_res = self.projector_t(temp1, self.options) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/midea/ai/LEAP-1.16/unrolling/Optimization_recon.py", line 53, in forward bp = self.bj(proj) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/leaptorch.py", line 489, in forward return BackProjectorFunctionGPU.apply(input, self.proj_data, self.vol_data, self.param_id_t) File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/leaptorch.py", line 89, in forward lct.backproject_gpu(g, f, param_id.item()) # compute input (f) from proj (g) RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/midea/ai/LEAP-1.16/unrolling/Optimization_main.py", line 197, in
trainer.test(network, test_loader, ckpt_path='/home/midea/ai/LEAP-1.16/lightning_logs/version_123/epoch=14-step=9000.ckpt')
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 742, in test
return call._call_and_handle_interrupt(
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 67, in _call_and_handle_interrupt
trainer._teardown()
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1003, in _teardown
self.strategy.teardown()
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 498, in teardown
self.lightning_module.cpu()
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 79, in cpu
return super().cpu()
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 798, in cpu
return self._apply(lambda t: t.cpu())
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/home/midea/anaconda3/envs/leap/lib/python3.9/site-packages/torch/nn/modules/module.py", line 798, in
return self._apply(lambda t: t.cpu())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Testing DataLoader 0: 0%| | 0/200 [00:00<?, ?it/s]