facebookresearch / mvp

Training and Evaluation Code for "Mixture of Volumetric Primitives for Efficient Neural Rendering"
Other
191 stars 16 forks source link

What's basetransf matrix used for? #6

Closed Qingcsai closed 2 years ago

Qingcsai commented 2 years ago

Hi, I have a little question when I apply my own data using this code. What's the self.basetransf matrix used for as in multiviewvideo.py do? I find this 3x4 matrix is applied for all camera poses and all the frametransf, but wht's the purpose for it? :)

https://github.com/facebookresearch/mvp/blob/d758f53662e79d7fec885f4dd1a3ee457f7c4b00/data/multiviewvideo.py#L410-L415

https://github.com/facebookresearch/mvp/blob/d758f53662e79d7fec885f4dd1a3ee457f7c4b00/data/multiviewvideo.py#L385-L387

Besides, I find it necessary to apply this basetransf, because when I change it to an eyes matrix, it didn't converge during training. So how to get my own basetransf?

Your answer will help me a lot! Thank you!

stephenlombardi commented 2 years ago

Hi,

The purpose of basetransf is to center the object of interest. The reason why we need it is because the coordinate frame (i.e., extrinsics) of the cameras may not have the object at the origin of the world. For example, in our camera calibration process one of the cameras is selected as the world origin (0,0,0) after calibration (in other words, one of the cameras is the origin of the coordinate frame). However, the raymarching code shoots rays from the camera and intersects them with the axis-aligned box from [-1,1]^3. For that reason, we need basetransf, which transforms the camera locations/orientations into a new coordinate frame where object the cameras are pointed at are at the world origin (0,0,0). There's also a parameter called "volradius" which is the radius of this axis-aligned bounding box [-1,1]^3 in world space. In other words, it scales the coordinate frame down so that the object fits in [-1,1]^3.

To determine basetransf for your own data there's a few things you can try. The simplest would be to just average the camera positions and put it into the last column of basetransf with the identity matrix along the diagonal (i.e., [1., 0., 0., cam_average[0]], [0., 1., 0., cam_average[1]], [0., 0., 1., cam_average[2]]). Note that this will only work well if your cameras are located on a sphere and all pointed inward to the same point. In general, you want the last column of basetransf to be the object position in the camera coordinate frame. Let me know if this helps.

Qingcsai commented 2 years ago

Thanks! It helps a lot, my cameras are located on a sphere with about 100+ views, and after I transl the scene/object to world origin (0,0,0), I find it not converge for some other reasons, I am checking for it.

Qingcsai commented 2 years ago

Hi @stephenlombardi, I am still getting some troubles with your code apply at interhand dataset, could you help me? The cameras are located on a sphere with about 100+ views, after I set basetransf with the identity matrix along the diagonal as you said and the translation are one of the cameras to keep the cameras are pointed at the world origin (0,0,0).

And in the simplest cases I use the neural volumn setting rather than mvp settings.

However I find it converge so slow during training.

Here is the result of rendering after training 5000 epoch, with batchsize=16, volradius=512:

https://user-images.githubusercontent.com/41203342/167774401-57738525-1df8-4198-b5a9-0b8568b3c74b.mp4

As you can see, the volumn is distributed everywhere rather than on a tight area, which I think is abnormal.

The images are 512*334 size, am I setting the right volradius=512? Because I see the dryice1 data is 667*1024 size and setting the volradius=256.

Qingcsai commented 2 years ago

Oh, It's is training normally for nv setting right now after setting the right camera params. Closed as marked.

TinBacon commented 1 year ago

Oh, It's is training normally for nv setting right now after setting the right camera params. Closed as marked.

hello, i have a same problem with u, could u tell me which camera parameters u have set that make everything right?

thanks very much!