CosmiQ / solaris

CosmiQ Works Geospatial Machine Learning Analysis Toolkit
https://solaris.readthedocs.io
Apache License 2.0
414 stars 112 forks source link

CUDA out of memory for large image inference #361

Open avanetten opened 4 years ago

avanetten commented 4 years ago

Summary of the bug

When running inference on images larger than ~1200x1200, CUDA often runs out of memory. This looks to be because the tiler puts all subwindows of a large image into a single batch (https://github.com/CosmiQ/solaris/blob/master/solaris/nets/infer.py#L75). This large batch can then be too large to fit into memory.

Steps to reproduce the bug

# In this case eg.yml points to images of size 2048x2048
import solaris as sol
config_path = 'eg.yml'
config = sol.utils.config.parse(config_path)
print('Config:')
print(config)
inferer = sol.nets.infer.Inferer(config)
inferer()

Buggy behavior and/or error message

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 11.93 GiB total capacity; 8.75 GiB already allocated; 763.06 MiB free; 10.74 GiB reserved in total by PyTorch)

Expected behavior

Inference should run smoothly on large images

Servando1990 commented 4 years ago

HI!

I'm getting the same error: RuntimeError: CUDA out of memory. Tried to allocate 1.56 GiB (GPU 0; 14.73 GiB total capacity; 12.90 GiB already allocated; 947.88 MiB free; 12.92 GiB reserved in total by PyTorch)

I wonder is the inference and training work independently, because otherwise, I won't bother start training since I'd be getting the same error

Could you share your yml file ?