ShijieZhou-UCLA / feature-3dgs

[CVPR 2024 Highlight] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Other
374 stars 23 forks source link

Question for render feature? #7

Closed SYSUykLin closed 7 months ago

SYSUykLin commented 8 months ago

Hello: I previously attempted to render features with 256 dimensions, but CUDA indicated insufficient shared memory, allowing for a maximum of only 40 dimensions to be rendered. May I ask what changes you made to enable it to render 256 dimensions?

41xu commented 7 months ago

As far as I understand, in the rasterization process, they use shared memory for calculating the collected features/colors and for gradient calculation. The shared memory is limited by specific GPU. In this paper, they dynamically allocate a cuda array as a cache for the collected features to avoid using shared memory (of course it's the tradeoff between the need for dimension and shared memory issue). You can see the implementation here: https://github.com/ShijieZhou-UCLA/feature-3dgs/blob/9e714ff841dd90b2d3e9d29bf3d1a74b90d414a7/submodules/diff-gaussian-rasterization/cuda_rasterizer/rasterizer_impl.cu#L398 If I misunderstand, please point me out.

JrMeng0312 commented 7 months ago

https://github.com/graphdeco-inria/gaussian-splatting/issues/41#issuecomment-1805179311 you can try this: adding "-Xcompiler -fno-gnu-unique" option in submodules/diff-gaussian-rasterization/setup.py: line 29 resolves the illegal memory access error in training.

extra_compile_args={"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "third_party/glm/")]})

SYSUykLin commented 7 months ago

As far as I understand, in the rasterization process, they use shared memory for calculating the collected features/colors and for gradient calculation. The shared memory is limited by specific GPU. In this paper, they dynamically allocate a cuda array as a cache for the collected features to avoid using shared memory (of course it's the tradeoff between the need for dimension and shared memory issue). You can see the implementation here:

https://github.com/ShijieZhou-UCLA/feature-3dgs/blob/9e714ff841dd90b2d3e9d29bf3d1a74b90d414a7/submodules/diff-gaussian-rasterization/cuda_rasterizer/rasterizer_impl.cu#L398

If I misunderstand, please point me out.

Thanks very very very much.

SYSUykLin commented 7 months ago

As far as I understand, in the rasterization process, they use shared memory for calculating the collected features/colors and for gradient calculation. The shared memory is limited by specific GPU. In this paper, they dynamically allocate a cuda array as a cache for the collected features to avoid using shared memory (of course it's the tradeoff between the need for dimension and shared memory issue). You can see the implementation here: https://github.com/ShijieZhou-UCLA/feature-3dgs/blob/9e714ff841dd90b2d3e9d29bf3d1a74b90d414a7/submodules/diff-gaussian-rasterization/cuda_rasterizer/rasterizer_impl.cu#L398

If I misunderstand, please point me out.

Thanks very very very much.

Thanks