Jingkang50 / OpenOOD

Benchmarking Generalized Out-of-Distribution Detection
MIT License
858 stars 108 forks source link

Weird CUDA OOM error #133

Closed zjysteven closed 1 year ago

zjysteven commented 1 year ago

Hi,

When I'm running the following provided CIFAR-10 script on one Quadro RTX 6000 24GB GPU,

CUDA_VISIBLE_DEVICES="7"
python main.py \
    --config configs/datasets/cifar10/cifar10.yml \
    configs/preprocessors/base_preprocessor.yml \
    configs/networks/resnet18_32x32.yml \
    configs/pipelines/train/baseline.yml \

I got CUDA out of memory error as follows which is pretty weird.

RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 23.65 GiB total capacity; 412.68 MiB already allocated; 8.56 MiB free; 442.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

There should be nothing wrong with my installed openood conda environment as I could successfully run my own script in this environment. Then I have no idea what's going on here. Would appreciate some help!

Thanks

zjysteven commented 1 year ago

The issue would be gone if I install pytorch with the exact version of cuda toolkit that matches my system.