Has anyone encountered this problem? The problem occurs more randomly at different epochs. When it occurs, the training stops, but the memory is not automatically released.
My Environment:
CUDA 11.2
cudatoolkit 11.1
torch 1.9.1+cu111
pytorch-lightning 1.6.0
python 3.7.13
mmdet3d 1.0.0rc4
mmcv 1.6.0
mmcv-full 1.6.1
mmsegmentation 0.27.0
python: /opt/conda/conda-bld/magma-cuda111_1605822518874/work/interface_cuda/interface.cpp:901: void magma_queue_create_internal(magma_device_t, cudaStream_t, cublasHandle_t, cusparseHandle_t, magma_queue*, const char, const char*, int): Assertion `queue->dCarray != null’ failed.
Has anyone encountered this problem? The problem occurs more randomly at different epochs. When it occurs, the training stops, but the memory is not automatically released.
My Environment: CUDA 11.2 cudatoolkit 11.1 torch 1.9.1+cu111 pytorch-lightning 1.6.0 python 3.7.13 mmdet3d 1.0.0rc4 mmcv 1.6.0 mmcv-full 1.6.1 mmsegmentation 0.27.0