Open wwwe1ty opened 2 years ago
@coreylammie CUDA error in memtorch/cu/tile_matmul_kernels.cu(183): an illegal memory access was encountered
The trackback is this when I use py file to run the exemplar files
Hi @wwwe1ty,
This may be related to #120. The following logic is used to configure and launch CUDA kernels when active crossbars are simulated: https://github.com/coreylammie/MemTorch/blob/master/memtorch/cu/tile_matmul_kernels.cu#L124-L156. While logic is used to determine whether multiple blocks are required, grid-stride loops have not been used, and it is assumed that the maximum number of threads per block and the maximum grid size is not exceeded.
Could you please specify what GPU you used to run this notebook and what layer you received this error for? It looks like from cudaDeviceProp
(https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/html/group__CUDART__DEVICE_g5aa4f47938af8276f08074d09b7d520c.html), the maximum number of threads per block and grid sizes can be determined.
Kind Regards,
Corey.
Hi @coreylammie ,
Thanks for your reply and your awesome memtorch! The GPU should be Nvidia 3090, and I used nvidia-smi and showed those information:
NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6
.
And error happened at
Tuned bh.Conv2d(in_channels=320, out_channels=1280, kernel_size=(1, 1), stride=(1, 1), padding=(0, 0)). Coefficient of determination: 0.747785 [8063.644043, 0.007728]
(57th tuned)
CUDA error in memtorch/cu/tile_matmul_kernels.cu(183): an illegal memory access was encountered
The last information shown is the 57th, so I think the error occurred at the 58th layer.
I am sorry I do not know how to use cudaDeviceProp
, so I am not sure whether nvidia-smi can provide enough information of GPU for you. If not, could you please let me know how to use that function? So I can provide more details.
Best, Welty
Hi @coreylammie ,
I have the same problem, when running Tutorial.ipynb
and also patching the linear layers (module_parameters_to_patch=[torch.nn.Conv2d, torch.nn.Linear],
).
The error occurs during tuning of the linear layers.
The resulting error message is also: CUDA error in memtorch/cu/tile_matmul_kernels.cu(183): an illegal memory access was encountered
torch.cuda.get_device_properties(device)
returns the following:
_CudaDeviceProperties(name='NVIDIA A100-PCIE-40GB', major=8, minor=0, total_memory=40354MB, multi_processor_count=108)
On top of that, I found that switching to the CPU for tuning is not possible. If you use the CUDA enabled version of MemTorch and CUDA is available, tuning will always be done with CUDA. Switching to memtorch-cpu, circumvents the problem, but then training and inference also have to be executed on the CPU.
Below the full log:
_CudaDeviceProperties(name='NVIDIA A100-PCIE-40GB', major=8, minor=0, total_memory=40354MB, multi_processor_count=108)
Patched Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) -> bh.Conv2d(in_channels=1, out_channels=20, kernel_size=(5, 5), stride=(1, 1), padding=(0, 0))
Patched Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1)) -> bh.Conv2d(in_channels=20, out_channels=50, kernel_size=(5, 5), stride=(1, 1), padding=(0, 0))
Patched Linear(in_features=800, out_features=500, bias=True) -> bh.Linear(in_features=800, out_features=500, bias=True)
Patched Linear(in_features=500, out_features=10, bias=True) -> bh.Linear(in_features=500, out_features=10, bias=True)
Tuned bh.Conv2d(in_channels=1, out_channels=20, kernel_size=(5, 5), stride=(1, 1), padding=(0, 0)). Coefficient of determination: 0.717874 [222.638397, -0.013010]
Tuned bh.Conv2d(in_channels=20, out_channels=50, kernel_size=(5, 5), stride=(1, 1), padding=(0, 0)). Coefficient of determination: 0.957314 [201.858917, -0.070494]
CUDA error in memtorch/cu/tile_matmul_kernels.cu(183): an illegal memory access was encounteredmake: *** [patch] Error 1
Thank you for your help
Hi @wwwe1ty and @dgr-b,
Thank you for your patience, and for letting me know about the CPU tuning bug. I'm actively working on this when I have time, however, this is quite limited at the moment due to various ongoing commitments! I'll try my best to get to this in the next couple of weeks.
40GB of VRAM should be more than sufficient to tune and perform inference for such a network. There is likely a memory leak in one of the CUDA kernels, which is not releasing memory properly. When I do get an opportunity to look into this further, alongside the updated fix, I'll add a fallback argument such that the CPU can be used for tuning.
Kind Regards,
Corey.
I also have this issue when I tried to tune both convolution and linear layers with GPU. I have no problems when only tune the convolution layers with either CPU or GPU. Once I tune the linear layers with GPU, the problem comes.
Any pointers on how to fix this would be greatly appreciated, even just a direction to look into would be very helpful.
Hi, I run Exemplar_Simulations.ipynb file by jupyter notebook and found the whole process cannot be completed because of kernal dying without any error, so I tested it again and try to find out the reason. The error occurred in function 'trial', when patching or tuning. I looked through the issue and I think the possible reason is GPU out of memory, but this should be related to Conv2d. So could you please give me some suggestions about it? Thanks.
Welty