Got cutlass error: Error Internal at: 363

JustASquid commented 2 years ago

Here are my logs when running the fox example in the testbed (After building successfully following the guidance here:

16:07:43 INFO     Loading NeRF dataset from
16:07:43 INFO       data\nerf\fox\transforms.json
16:07:43 SUCCESS  Loaded 50 images of size 1080x1920 after 0s
16:07:43 INFO       cam_aabb=[min=[1.0229,-1.33309,-0.378748], max=[2.46175,1.00721,1.41295]]
16:07:43 INFO     Loading network config from: configs\nerf\base.json
16:07:43 INFO     GridEncoding:  Nmin=16 b=1.51572 F=2 T=2^19 L=16
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
16:07:43 INFO     Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
16:07:43 INFO     Color model:   3--[SphericalHarmonics]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
16:07:43 INFO       total_encoding_params=13074912 total_network_params=9728
Got cutlass error: Error Internal at: 363
Could not free memory: C:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:444 cudaDeviceSynchronize() failed with error operation not permitted when stream is capturing

Information: OS: windows 10 CUDA version: cuda_11.6.r11.6/compiler.30794723_0 MSVC version: 19.29.30140 GPU: GTX 1080 Ti

satyajit-ink commented 2 years ago

same here

root@thor:/opt/src# ./build/testbed --scene data/nerf/fox
05:46:17 WARNING  Insufficient compute capability 52 detected.
05:46:17 WARNING  This program was compiled for >=61 and may thus behave unexpectedly.
05:46:17 INFO     Loading NeRF dataset from
05:46:17 INFO       data/nerf/fox/transforms.json
05:46:17 SUCCESS  Loaded 50 images of size 1080x1920 after 0s
05:46:17 INFO       cam_aabb=[min=[0.5,0.5,0.5], max=[0.5,0.5,0.5]]
05:46:17 INFO     Loading network config from: configs/nerf/base.json
05:46:17 INFO     GridEncoding:  Nmin=16 b=1.51572 F=2 T=2^19 L=16
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
05:46:17 INFO     Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
05:46:17 INFO     Color model:   3--[SphericalHarmonics]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
05:46:17 INFO       total_encoding_params=13074912 total_network_params=9728
Got cutlass error: Error Internal at: 363
Could not free memory: /opt/src/dependencies/tiny-cuda-nn/include/tiny-cuda-nn/gpu_memory.h:444 cudaDeviceSynchronize() failed with error operation not permitted when stream is capturing

Tom94 commented 2 years ago

Based on the warnings,

05:46:17 WARNING  Insufficient compute capability 52 detected.
05:46:17 WARNING  This program was compiled for >=61 and may thus behave unexpectedly.

I suspect/hope the problem can be fixed by setting the environment variable TCNN_CUDA_ARCHITECTURES=52, deleting the build folder, and compiling again from scratch.

Tom94 commented 2 years ago

Based on https://github.com/NVlabs/tiny-cuda-nn/issues/47, I'm wondering whether building in Release mode instead of RelWithDebInfo could fix this problem on your end.

Could you try clearing your build folder and then building via

instant-ngp$ cmake . -B build
instant-ngp$ cmake --build build --config Release -j 16

? Thanks!

satyajit-ink commented 2 years ago

I had wrongly selected my gpu arch, it was a Titan X, arch 52. Deleting build folder and remaking fixed it

JustASquid commented 2 years ago

@Tom94 I tested with building in release mode only as you suggested, and it fails with the same error.

jeffi commented 2 years ago

I encountered this issue as well. It disappeared when I went back to 2dcdc892904de145c570e5726b5d3662e7f3af85

JustASquid commented 2 years ago

@jeffi I don't know what compiler you're running, but going back to that commit resulted in a raft of compiler errors for me.

jeffi commented 2 years ago

I got a raft of compiler warnings, but the build completed. It looked like the same warnings that I was getting on HEAD though. When I have more time I can help bisect until I find the breaking commit.

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 nvcc (Build cuda_11.1.TC455_06.29190527_0)

Tom94 commented 2 years ago

@jeffi a bisect would be incredibly helpful for me to troubleshoot this, thanks a bunch for offering!

eternaldolphin commented 2 years ago

hi,similar error I'm very sure it is not because of memory,as I have 32G Tesla V100.

07:52:36 INFO Loading NeRF dataset from 07:52:36 INFO data/nerf/fox/transforms.json 07:52:36 SUCCESS Loaded 50 images of size 1080x1920 after 0s 07:52:36 INFO cam_aabb=[min=[0.5,0.5,0.5], max=[0.5,0.5,0.5]] 07:52:36 INFO Loading network config from: /host/home/rd/dupenghui/instant-ngp/configs/nerf/base.json 07:52:36 INFO GridEncoding: Nmin=16 b=1.38191 F=2 T=2^19 L=16 Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. 07:52:36 INFO Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1 07:52:36 INFO Color model: 3--[Composite]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3 07:52:36 INFO total_encoding_params=12196240 total_network_params=9728 Training: 0%| | 0/100000 [00:00<?, ?step/s]Got cutlass error: Error Internal at: 363 Could not free memory: /host/home/rd/xxx/instant-ngp/dependencies/tiny-cuda-nn/include/tiny-cuda-nn/gpu_memory.h:456 cudaDeviceSynchronize() failed with error operation not permitted when stream is capturing

satyajit-ink commented 2 years ago

@eternaldolphin there's an environment variable for it

$ TCNN_CUDA_ARCHITECTURES=52 cmake . -B build
$ cmake --build build --config Release -j 16

(also clear your build folder before running cmake again)

the architecture number 52 comes from here https://en.wikipedia.org/wiki/CUDA#GPUs_supported. I have a GeForce GTX Titan X (Maxwell) and its compute capability is 5.2 so if you have Tesla V100 its compute capability is 7.0 so you should do TCNN_CUDA_ARCHITECTURES=70 cmake . -B build (you'll have to clear the build folder and rebuild whenever you change the target cuda architecture)

eternaldolphin commented 2 years ago

@satyajit-ink thank you,but I clear the build folder and rebuild with TCNN_CUDA_ARCHITECTURES=70,and it does not work

and I find you have the warning "05:46:17 WARNING Insufficient compute capability 52 detected. 05:46:17 WARNING This program was compiled for >=61 and may thus behave unexpectedly."

but I and @JustASquid do not have that warning, and "rebuild with TCNN_CUDA_ARCHITECTURES" dose not work.Still the same error

eternaldolphin commented 2 years ago

I encountered this issue as well. It disappeared when I went back to 2dcdc89

so,how to "went back to https://github.com/NVlabs/instant-ngp/commit/2dcdc892904de145c570e5726b5d3662e7f3af85" @jeffi I run "git reset 2dcdc89" and clean build and rebuilt and got a raft of compiler errors too (QAQ),and the build did not complete.

shehrum commented 2 years ago

Facing the same issue, anybody got any luck resolving this yet for the headless version?

shehrum commented 2 years ago

11:04:24 INFO Loading NeRF dataset from 11:04:24 INFO data/nerf/fox/transforms.json 11:04:24 SUCCESS Loaded 50 images of size 1080x1920 after 0s 11:04:24 INFO cam_aabb=[min=[0.5,0.5,0.5], max=[0.5,0.5,0.5]] 11:04:24 INFO Loading network config from: configs/nerf/base.json 11:04:24 INFO GridEncoding: Nmin=16 b=1.51572 F=2 T=2^19 L=16 Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. 11:04:24 INFO Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1 11:04:24 INFO Color model: 3--[Composite]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3 11:04:24 INFO total_encoding_params=13074912 total_network_params=9728 Got cutlass error: Error Internal at: 363 Could not free memory: /home/ubuntu/instant-ngp/dependencies/tiny-cuda-nn/include/tiny-cuda-nn/gpu_memory.h:456 cudaDeviceSynchronize() failed with error operation not permitted when stream is capturing

Built with: TCNN_CUDA_ARCHITECTURES=70 cmake . -B build -DNGP_BUILD_WITH_GUI=OFF cmake --build build --config RelWithDebInfo -j 16

Have also tried building with : cmake --build build --config Release -j 16

gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) Cuda compilation tools, release 10.2, V10.2.89 cmake version 3.22.2 V100 GPU on EC2 P3.2xlarge

msollami-sf commented 2 years ago

I'm seeing this same problem on my p3 instance (v100):

/build/testbed --scene data/nerf/fox --width 50 --height 50 01:56:44 INFO Loading NeRF dataset from 01:56:44 INFO data/nerf/fox/transforms.json 01:56:44 SUCCESS Loaded 50 images of size 1080x1920 after 0s 01:56:44 INFO cam_aabb=[min=[0.5,0.5,0.5], max=[0.5,0.5,0.5]] 01:56:44 INFO Loading network config from: configs/nerf/base.json 01:56:44 INFO GridEncoding: Nmin=16 b=1.51572 F=2 T=2^19 L=16 Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. 01:56:44 INFO Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1 01:56:44 INFO Color model: 3--[Composite]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3 01:56:44 INFO total_encoding_params=13074912 total_network_params=9728 Got cutlass error: Error Internal at: 363 Could not free memory: /home/ubuntu/instant-ngp/dependencies/tiny-cuda-nn/include/tiny-cuda-nn/gpu_memory.h:458 cudaDeviceSynchronize() failed with error operation not permitted when stream is capturing

Any update on this?

youkunanyanlys commented 2 years ago

I had the same problem when I used the command（data-folder$ python [path-to-instant-ngp]/scripts/colmap2nerf.py --video_in --video_fps 2 --run_colmap --aabb_scale 16） to retrain the fox images.

solved： But when I modify the value of --aabb_scale to 4 solved the problem .I'm researching the cause, hope it helps

Tom94 commented 2 years ago

These should be resolved now. Please feel free to re-open the issue if not.

ricshaw commented 2 years ago

This is still an error for me, although it manages to train: Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.

gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) Cuda compilation tools, release 10.2, V10.2.89 cmake version 3.22.2 V100 GPU

Saoyu99 commented 2 years ago

TCNN_CUDA_ARCHITECTURES=70 cmake . -B build -DNGP_BUILD_WITH_GUI=OFF cmake --build build --config RelWithDebInfo -j 16

it will help when i build this project in a no-gui server,which gpu is V100

jelmerS2 commented 2 years ago

I just got this. I suspect it has to do with scene size and memory. A small scene like the fox works fine. My large scene doesn't. The scene used 90GB RAM during COLMAP; I'm not sure how that relates to GPU memory usage in testbed.exe, but that scene does give this error. I got 2xP6000 (24GB each), but they don't max out when starting up testbed.exe...

NVlabs / instant-ngp

Got cutlass error: Error Internal at: 363 #219