Closed amooose closed 2 years ago
I use torch 1.8 to convert torchscript model, maybe you can try convert it using 1.9 or 1.10. https://github.com/saic-mdal/lama/pull/63
I use torch 1.8 to convert torchscript model, maybe you can try convert it using 1.9 or 1.10. saic-mdal/lama#63
Thanks for the tip, I used your script, had to modify it a bit to run,
image = torch.rand(1, 3, 120, 120).to(device)
mask = torch.rand(1, 1, 120, 120).to(device)
output = jit_model_wrapper(image, mask)
and had an output successfully generated on 1.10 with "diff: 0.0".
Unfortunately with this new model installed, the same error occurs. Would there be anything else that I could try?
You seem to be running into memory issues (RAM, not GPU). I get the same error on some servers with low memory. Please note that images higher than 2k will use up to, or even more than, 32GB of RAM.
I would recommend at least 64GB if often working with 4k and higher images.
Depending on your system, consuming all available RAM will result in a CUDA error just like yours or a task kill being returned. I mainly noticed these errors on linux systems. Windows seems to try and free as much memory as possible to avoid a crash of the process.
Why your image take up so much memory?
# rgb image
np_img = np.ones((2000,2000,3), dtype=np.uint8)
np_img.nbytes / 1024 / 1024 # ~= 11MB
That's a good question. All I know is that LaMa uses a lot of memory with big images on my setup, no matter the system nor if you're running cpu/gpu.
I tried it on 3 different OS (Windows, Debian, Ubuntu), with various configurations and it has always been using a lot of memory with high-quality input (starting from 2k).
Why your image take up so much memory?
# rgb image np_img = np.ones((2000,2000,3), dtype=np.uint8) np_img.nbytes / 1024 / 1024 # ~= 11MB
lazy coding
I can't reproduce in my environment(RTX3090 + pytorch1.8.2)
This is the result of benchmark script
Add crop mode for very large image, this helps with memory issues.
It seems to error out on larger images when editing the original size. 3840x2880 3648x5472 2299x3065
I've attached the error outputs: err2.txt err1.txt
I'm on CUDA 11.1 (I've tried 11.3/11.5 too) and have also tried torch 1.9-1.10, with the same errors occurring on my RTX 3080 GPU. I'm using the Docker setup but with the torch replaced with the cuda 11.1+ version. Selecting 2k or 1080p usually results in it working properly (I believe sometimes 2k will throw an error, but then will work)