training my own video dataset

geekdreamer04 commented 1 year ago

I am trying to train my own inward facing video dataset with 30 cameras. I am unable to understand the coordinate system of the code in hyperreel. My current poses are all in openCV coordinate system x right, y down and z forward notation. How can I translate of use my transform matrix with your code? I trying building on this using immersive.py code base and immersive_sphere.yml. But my model doesn't seem to learn the volume. I am getting bllurred/ colourful novel views which is really weird. Do you have any suggestion for this? Please tell me a way to use my openCV coordinate system based transform and intrinsics and train the model using your code! Also,what is the most ideal model for my case with inward facing scene/ video data/ openCV coordinates? Please respond.

benattal commented 1 year ago

HyperReel is +x right, +y up, -z. In order to convert from openCV to HyperReel's poses (camera to world), you can do something like this:

pose_pre = np.eye(4)
pose_pre[1, 1] *= -1
pose_pre[2, 2] *= -1
pose = pose_pre @ pose @ pose_pre

The first step pose @ pose_pre will make it so that our ray-generation methods create rays that are consistent with those from OpenCV. The second step pose_pre @ pose @ pose_pre is not strictly necessary, but will transform world space, and make sure that the determinant of the upper 3x3 of the pose remains 1.

I recommend playing around with different pose transformations if the one above does not work. There might also be an issue with your scene bounds, but it's hard to say without seeing the results.

I would say that starting with the immersive configs (which are also used for inward facing scenes) makes sense. If you'd like to share your data, I can also spend some time trying to get things to work.

ZhenhuiL1n commented 1 year ago

Hi, I am also trying to train it on my own datasets, my validation images can predict good results however the validation video outputs are blurred/colorful also. I tried to comment on this pose transformation and not to comment. both output the same weird rgb results for me. I am transforming the instant-ngp format to this immersive format but don't know which part is worng. Could you please give some hints? Thanks a lot!

facebookresearch / hyperreel

training my own video dataset #20