CUDA_ERROR_OUT_OF_MEMORY

Hopperpop commented 2 years ago

Hi,

Thanks for sharing this great work. I'm trying to run the samples on smaller gpu: GTX1060 6Gb. The "Einstein" examples runs fine, but when I run the fox example I get:

13:49:05 SUCCESS  Loaded 50 images of size 1080x1920 after 1s
13:49:05 INFO       cam_aabb=[min=[1.0229,-1.33309,-0.378748], max=[2.46175,1.00721,1.41295]]
13:49:05 INFO     Loading network config from: configs\nerf\base.json
13:49:05 INFO     GridEncoding:  Nmin=16 b=1.51572 F=2 T=2^19 L=16
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
13:49:05 INFO     Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
13:49:05 INFO     Color model:   3--[SphericalHarmonics]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
13:49:05 INFO       total_encoding_params=13074912 total_network_params=9728
13:49:06 ERROR    Uncaught exception: ***\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:531 cuMemSetAccess(m_base_address + m_size, n_bytes_to_allocate, &access_desc, 1) failed with error CUDA_ERROR_OUT_OF_MEMORY
Could not free memory: ***\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:452 cuMemAddressFree(m_base_address, m_max_size) failed with error CUDA_ERROR_INVALID_VALUE

Is it still possible to run this example with some modified parameters for gpu's with lower memory, or should I give up?

Small note: atomicAdd(__half2) is also not supported on my architecture (=61). I needed to disable it in "common_device.cuh".

Tom94 commented 2 years ago

Hi there,

you might be able to further squeeze down the memory usage by reducing the resolution --width 1280 --height 720, but I'm unsure this will be enough.

Regarding atomicAdd(__half2): I'm surprised actually. How does this error manifest? I'd like to make this codebase work on as wide a range of GPUs as possible and both the CUDA documentation and CI suggest it should work on compute capability 61.

Hopperpop commented 2 years ago

When building I get:

D:***\include\neural-graphics-primitives/common_device.cuh(127): error : no instance of overloaded function "atomicAdd" matches the argument list [D:\***\build\ngp.vcxproj]
              argument types are: (__half2 *, {...})
            detected during instantiation of "void ngp::deposit_image_gradient(const Eigen::Matrix<float, N_DIMS, 1, <expression>, N_DIMS, 1> &, T *, T *, const Eigen::Vector2i &, const
   Eigen::Vector2f &) [with N_DIMS=2U, T=float]"
  D:\***\src\testbed_nerf.cu(1512): here

D:\***\include\neural-graphics-primitives/common_device.cuh(128): error : no instance of overloaded function "atomicAdd" matches the argument list [D:***\build\ngp.vcxproj]
              argument types are: (__half2 *, {...})
            detected during instantiation of "void ngp::deposit_image_gradient(const Eigen::Matrix<float, N_DIMS, 1, <expression>, N_DIMS, 1> &, T *, T *, const Eigen::Vector2i &, const
   Eigen::Vector2f &) [with N_DIMS=2U, T=float]"
  D:\***\src\testbed_nerf.cu(1512): here

Maybe it's a wrong dependency problem, instead of the hardware not supporting it: Win10 Cuda compilation tools, release 11.6, V11.6.55 Build cuda_11.6.r11.6/compiler.30794723_0 cmake version 3.23.0-rc1 Python 3.8.10

Running following command still gives me the same error: ./build/testbed.exe --scene data/nerf/fox --width 10 --height 10

But reducing the amount of photo's to 20, makes it possible to run it.

Krulknul commented 2 years ago

I just got the same error, running on a GTX 1080. The fox does work for me, but when I try to run a dataset I prepared myself it gives this error. That was a very large dataset though, so I just tried shrinking it down and it still gives the same error. (smaller than the fox in MB's at this point.)

edit: I forgot to add, that adding --width 10 --height 10 also does nothing for me.

JustASquid commented 2 years ago

I'm also getting the same error regarding atomicAdd (Running on a GTX 1080 TI)

chris-aeviator commented 2 years ago

What are the VRAM requirements for the provided examples after all?

Tom94 commented 2 years ago

The VRAM requirements vary with architecture with older GPUs unfortunately requiring more RAM due to needing fp32 for efficiency and not being able to run fully fused neural networks.

In general, it seems that 8 GB are enough to run fox in all cases -- so only a little push would be needed to make it fit into OP's 6 GB card.

NVlabs / instant-ngp

CUDA_ERROR_OUT_OF_MEMORY #197