Open LvisRoot opened 1 year ago
After taking a look at the current main branch, I found out that there's a recent CUDA implementation of coordinate gradients computation.
I modify the backward function so it would return the grad_coord
and now get non-zero gradients for the camera extrinsics.
I run a few experiments in the replica dataset by adding some noise to the camera poses to check if pose optimization works, trying to do something similar to BARF by adding LOD aneeling and trying different learning rates for the poses:
https://github.com/chenhsuanlin/bundle-adjusting-NeRF
However, so far I only get even blurrier renders when I optimize the extrinsics.
Has anyone already try something similar and would have some suggestions on what I could be missing in the optimization process?
Also I'm not too familiar with kaolin.Cameras
optimize friendly matrix_6dof_rotation
representation, and I'm tempted to change to an se3
representation instead since it is what I've always used in SLAM systems so far. Do you have any insights related to which representation could work better?
Thanks in advance :)
Hi @LvisRoot !
The 6DoF representation is from Zhou et al. 2019: https://arxiv.org/abs/1812.07035 The SE3 representation is less suitable for differentiation (you have to force the matrix to be orthogonal, but that may not be guaranteed - see paper above)
The kaolin docs further elaborate the difference:
class _MatrixSE3Rep(ExtrinsicsRep):
"""
4x4 matrix form of rigid transformations from SE(3), the special Euclidean group.
Uses the identity mapping from representation space to transformation space,
and thus simple and quick for non-differentiable camera operations.
However, without additional constraints, the over-parameterized nature of this representation
makes it unsuitable for optimization (e.g: transformations are not guaranteed to remain in SE(3)
during backpropagation).
"""
whereas:
class _Matrix6DofRotationRep(ExtrinsicsRep):
""" A representation space which supports differentiability in the space of rigid transformations.
That is, the view-matrix is guaranteed to represent a valid rigid transformation.
Under the hood, this representation keeps 6 DoF for rotation, and 3 additional ones for translation.
For conversion to view-matrix form, a single Gram–Schmidt step is required.
See: On the Continuity of Rotation Representations in Neural Networks, Zhou et al. 2019
"""
You can modify the following example in kaolin to see the difference between the two: https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/camera/cameras_differentiable.py
The backend representation can be picked with switch_backend
:
https://kaolin.readthedocs.io/en/latest/modules/kaolin.render.camera.camera_extrinsics.html#kaolin.render.camera.CameraExtrinsics.switch_backend
I'd actually suspect the hashgrid interpolation by coords.
First thing we can do is validate if there is a potential bug here:
OctreeGrid
) - the other grids in wisp use a different trilinear interpolation logictinycudann
(it's easily compatible with wisp), see the thread here: https://github.com/NVIDIAGameWorks/kaolin-wisp/issues/41Hi @orperel , thanks for your answer!
I saw the reference to the Matrix6DofRotation paper in kaolin
's documentation, but hadn't have the time to look at it in depth, so I folded back to using Lie se3
in the tangent space (not SE3
groups) which I know is suitable for optimization as well. I'm transforming them to SE3 to rotate the rays before tracing.
I've been trying the following grids lately:
wisp-hasgrid
tinycudann-hasgrid
wisp-triplanar
Run a bunch of experiments changing the pose noise strength, lr
and extrinsic_opt_lr
.
The only thing that worked for me was:
wisp-triplanar
grid (lr=0.01)extrinsics_lr=0.001
std=4.5deg
std=0.05
With that pose noise the representation gets pretty cloudy and noisy, but cleans up a lot when pose opt is on.
With hash-grids I spent even more time tuning parameters and I always ended up seeing poses diverging, moving into weird positions, getting a super cloudy or very smoothed out representation.
Do you have any insight why this could be?
Haven't tested with Matrix6DofRotationRep
after finding the first issues, I might do it in the upcoming days to avoid my se3
pipeline.
The only thing I found around regarding pose estimation with hash-grids that work was a comment in the Instant-NGP repo, where some details about how it tackles it are explained: https://github.com/NVlabs/instant-ngp/issues/69#issuecomment-1018345113
Which is not just gradient propagation for the rotation, but cross product the directions gradient and rotate the extrinsics orientation as AFAIU. https://github.com/NVlabs/instant-ngp/blob/00754afc1fbb933c6cefc020f6c4efbb4e1c9a1b/src/testbed_nerf.cu#L1765-L1776
So that's different to the method I'm currently using. I wonder if adding this would have a big impact in the results.
I run some more experiments used using the triplanar-grid
with Kaolin extrinsics in Matrix6DofRotationRep
only. In this case pose optimization works just as well as using se3
vectors for the extrinsics representation.
Here are some low-res renders without and with pose-optimization trained for 150 epochs (usually the takes ~400 epochs to get a good PSNR)
So in my case for the data I'm using (replica dataset) the issue is in the hash-grids.
I'll keep going with the triplanar grids for now, but it would be great if someone was able to make pose refinement work with hash-grids and can share some tips on how run it. I'm still interested in using hash-grids
as the underlying representation.
@LvisRoot Seems like the backward function is indeed bugged, the gradient wasn't returned to the py side 🤷
I've started a quick PR to fix this: https://github.com/NVIDIAGameWorks/kaolin-wisp/pull/145
I still need to test it more before we can merge it, but you're welcome to give it a try meanwhile (don't forget to run python setup.py develop
to rebuild the kernel).
Hi @orperel , Thanks for following up on this.
You're right, the gradients for the coordinates were not returned by the CUDA backend. For my experiments I had changed this from the beginning to test this out (but did not open a PR or anything):
I modify the backward function so it would return the
grad_coord
and now get non-zero gradients for the camera extrinsics.
However that didn't fix the pose opt issue.
Moreover, tinycudann
always returned the coordinates gradient, however it didn't work for me as well.
That's why I'm thinking that there's an underlying issue to use the plain coordinate gradients of hash grids for pose optimization :thinking: . Would be great to have some insights on why.
I'm wondering if this would also be a case for datasets with all cameras looking at a single object in contrast to Replica where you reconstruct rooms with cameras looking "from the inside".
After taking a look at the current main branch, I found out that there's a recent CUDA implementation of coordinate gradients computation.
I modify the backward function so it would return the
grad_coord
and now get non-zero gradients for the camera extrinsics. I run a few experiments in the replica dataset by adding some noise to the camera poses to check if pose optimization works, trying to do something similar to BARF by adding LOD aneeling and trying different learning rates for the poses: https://github.com/chenhsuanlin/bundle-adjusting-NeRFHowever, so far I only get even blurrier renders when I optimize the extrinsics.
Has anyone already try something similar and would have some suggestions on what I could be missing in the optimization process?
Also I'm not too familiar with
kaolin.Cameras
optimize friendlymatrix_6dof_rotation
representation, and I'm tempted to change to anse3
representation instead since it is what I've always used in SLAM systems so far. Do you have any insights related to which representation could work better?Thanks in advance :)
Hello, sorry to bother you, I have some questions to ask you about the camera pose optimization in the nerf system: https://github.com/Totoro97/f2-nerf/issues/84, I want to add pose optimization in f2-nerf, but I encountered a similar problem to what you mentioned, can you give me some advice
After taking a look at the current main branch, I found out that there's a recent CUDA implementation of coordinate gradients computation. I modify the backward function so it would return the
grad_coord
and now get non-zero gradients for the camera extrinsics. I run a few experiments in the replica dataset by adding some noise to the camera poses to check if pose optimization works, trying to do something similar to BARF by adding LOD aneeling and trying different learning rates for the poses: https://github.com/chenhsuanlin/bundle-adjusting-NeRF However, so far I only get even blurrier renders when I optimize the extrinsics. Has anyone already try something similar and would have some suggestions on what I could be missing in the optimization process? Also I'm not too familiar withkaolin.Cameras
optimize friendlymatrix_6dof_rotation
representation, and I'm tempted to change to anse3
representation instead since it is what I've always used in SLAM systems so far. Do you have any insights related to which representation could work better? Thanks in advance :)Hello, sorry to bother you, I have some questions to ask you about the camera pose optimization in the nerf system: https://github.com/Totoro97/f2-nerf/issues/84, I want to add pose optimization in f2-nerf, but I encountered a similar problem to what you mentioned, can you give me some advice
Hi @Bin-ze, I ended up not using hash grids as I wasn't able to implement/find an implementation of the gradients that wouldn't blow up.
I met other people from NVIDIA 2 months ago at a conference who used their hash grid implementation for pose-opt, but said some work had to be done in order to do pose-opt. I'm not sure if they pushed those changes to their open source repo though.
For me planar based grid approaches worked just fine for pose-opt (triplanar, TensoRF (not implemented in wisp but its easy to do from triplanar)).
A nice hash based approach that worked for me OOTB por pose-opt was https://github.com/RaduAlexandru/permutohedral_encoding which uses permutoredral grids instead of cubic ones, making it faster (less interpolations) and memory efficient for higher dimensions.
In terms of pose representations, both Tangent space se3
and matrix_6dof_rotation
worked fine for me, but I stick to matrix_6dof_rotation
since that's already implemented in Kaolin.
Hope this helps.
Best,
Claucho
After taking a look at the current main branch, I found out that there's a recent CUDA implementation of coordinate gradients computation. I modify the backward function so it would return the
grad_coord
and now get non-zero gradients for the camera extrinsics. I run a few experiments in the replica dataset by adding some noise to the camera poses to check if pose optimization works, trying to do something similar to BARF by adding LOD aneeling and trying different learning rates for the poses: https://github.com/chenhsuanlin/bundle-adjusting-NeRF However, so far I only get even blurrier renders when I optimize the extrinsics. Has anyone already try something similar and would have some suggestions on what I could be missing in the optimization process? Also I'm not too familiar withkaolin.Cameras
optimize friendlymatrix_6dof_rotation
representation, and I'm tempted to change to anse3
representation instead since it is what I've always used in SLAM systems so far. Do you have any insights related to which representation could work better? Thanks in advance :)Hello, sorry to bother you, I have some questions to ask you about the camera pose optimization in the nerf system: https://github.com/Totoro97/f2-nerf/issues/84, I want to add pose optimization in f2-nerf, but I encountered a similar problem to what you mentioned, can you give me some advice
Hi @Bin-ze, I ended up not using hash grids as I wasn't able to implement/find an implementation of the gradients that wouldn't blow up.
I met other people from NVIDIA 2 months ago at a conference who used their hash grid implementation for pose-opt, but said some work had to be done in order to do pose-opt. I'm not sure if they pushed those changes to their open source repo though.
For me planar based grid approaches worked just fine for pose-opt (triplanar, TensoRF (not implemented in wisp but its easy to do from triplanar)).
A nice hash based approach that worked for me OOTB por pose-opt was https://github.com/RaduAlexandru/permutohedral_encoding which uses permutoredral grids instead of cubic ones, making it faster (less interpolations) and memory efficient for higher dimensions.
In terms of pose representations, both Tangent space
se3
andmatrix_6dof_rotation
worked fine for me, but I stick tomatrix_6dof_rotation
since that's already implemented in Kaolin.Hope this helps.
Best,
Claucho
thank you for your reply! I still have some questions to ask:
Best,
Bin-ze
Hi there. First of all, thank you for open sourcing this super useful repo.
I wanted to do pose optimization within a wisp pipeline, leveraging the
kaolin.Camera
class, which is differentiable OOTB. I created a pipeline that transforms rays on each training step with updated extrinsics, but the gradients to the extrinsics parameters weren't propagating properly.After some debugging, I found that when using a hash grid the CUDA backward implementation of
interpolate
only computes the gradients for thecodebook
parameters. https://github.com/NVIDIAGameWorks/kaolin-wisp/blob/cb47e10f376e5ac8b6965c650d8a6b85b9bc968e/wisp/csrc/ops/hashgrid_interpolate.cpp#L96I was wondering if it Would it be possible to add the gradient computation for the coordinates as well, since it would be a great enhancement to make codebook-based pipelines fully differentiable up to the camera poses.