CUDA Out of memory error while running inference.

facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

https://facebookresearch.github.io/nougat/

MIT License

8.81k stars 561 forks source link

CUDA Out of memory error while running inference. #147

Open tamil-phy opened 11 months ago

tamil-phy commented 11 months ago

Error Message:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 98.00 MiB. GPU 0 has a total capacty of 47.45 GiB of which 55.38 MiB is free. Including non-PyTorch memory, this process has 47.40 GiB memory in use. Of the allocated memory 22.44 GiB is allocated by PyTorch, and 24.20 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Even if the multiple GPUs are configured to use, nougat does not seem to utilize them.

$ export CUDA_VISIBLE_DEVICES=0,1,2,3

I did a fresh install of nougat-ocr using pip and running it through a file.

vanangamudi commented 11 months ago

I am facing the exactly same issue. Until processing for 1 page is done, 27GB is allocated when moving onto second page the memory allocation shoots upto 48GB and gets killed for out of memory error. It runs very well on CPU mode though. Not sure why this happens when using GPU that too for processing just the second page itself.

Hansimov commented 11 months ago

You can try decreasing the batch size. For example, --batchsize 8 works well for my Quadro RTX 8000 (with 48GB memory).

Here is the full command I use:

nougat "input.pdf" --out "<output_directory>" --recompute --no-skipping --markdown --model 0.1.0-small --batchsize 8

brando90 commented 11 months ago

You can try decreasing the batch size. For example, --batchsize 8 works well for my Quadro RTX 8000 (with 48GB memory).

Here is the full command I use:
nougat "input.pdf" --out "<output_directory>" --recompute --no-skipping --markdown --model 0.1.0-small --batchsize 8

@Hansimov what is the default batch size?

brando90 commented 11 months ago

@Hansimov do you know why this woroked? How are sequences & batches created in nougat for a pdf in a nutshell such that reducing the batch size solves the OOM issue?

sidharthrajaram commented 10 months ago

Even with a reduced batch size, each run of nougat inference leads to increase in GPU memory usage. Eventually, leading to torch.cuda.OutOfMemoryError: CUDA out of memory.

Is there a memory leak somewhere? I also occasionally saw these warnings from pypdfium2 about memory leaks as mentioned in this issue: https://github.com/facebookresearch/nougat/issues/162