CERN / TIGRE

TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox
BSD 3-Clause "New" or "Revised" License
529 stars 180 forks source link

Conflict with CUDA Context after modifying FDK filtering step with CuPy and calling tigre.Atb() #484

Closed HBanjak closed 9 months ago

HBanjak commented 10 months ago

I have integrated the CuPy library to modify the FDK filtering step to run on the GPU. After doing so, I've encountered an issue when calling the tigre.Atb() function.

Environment:

TIGRE version: 2.2 CUDA version: 11.1 GPU Model: NVIDIA GeForce GTX 1080 Ti OS: Windows 10 Language: Python

Steps to Reproduce:

Modify the FDK filtering step using CuPy to run it on the GPU. Run a loop where: I use CuPy to perform GPU operations for the FDK filtering. Call tigre.Atb() for backprojection. Use CuPy again for other GPU operations. On the second iteration, the program crashes with exit code -1073741819 (0xC0000005) which is related to a memory access violation error.

The issue seems to arise after calling tigre.Atb(), suggesting that something within this function or its dependencies might be altering or destroying the CUDA context in a way that prevents subsequent CuPy GPU operations from proceeding.

I'd appreciate any insights or solutions to this problem. It seems like there's a conflict between TIGRE's internal CUDA operations and the GPU operations performed using CuPy.

AnderBiguri commented 10 months ago

Hi @HBanjak , apologies, I was on annual leave.

First, I would suggest you have a look at https://github.com/CERN/TIGRE/pull/423, as its basically the filtering on GPU. Its not finished yet, but I think its working nevertheless.

Probably this line is causing the issues: https://github.com/CERN/TIGRE/blob/master/Common/CUDA/voxel_backprojection.cu#L616

As in TIGRE standard use, the calls to Ax/Atb are modular and independent, destroying (or not) the context is irrelevant, so it seems that that line got left there at some point. But AFAIK there should be no problem with you removing that, hopefully that will fix your issue.

HBanjak commented 9 months ago

Hi @AnderBiguri,

Thanks for your reply.

I followed your recommendation to remove the line at: https://github.com/CERN/TIGRE/blob/master/Common/CUDA/voxel_backprojection.cu#L616 and it fixed the issue.

I appreciate the time you took to help me fix this problem.

AnderBiguri commented 9 months ago

Fantastic! good to hear!