Closed Qingcsai closed 2 years ago
Hi,
The purpose of basetransf is to center the object of interest. The reason why we need it is because the coordinate frame (i.e., extrinsics) of the cameras may not have the object at the origin of the world. For example, in our camera calibration process one of the cameras is selected as the world origin (0,0,0) after calibration (in other words, one of the cameras is the origin of the coordinate frame). However, the raymarching code shoots rays from the camera and intersects them with the axis-aligned box from [-1,1]^3. For that reason, we need basetransf, which transforms the camera locations/orientations into a new coordinate frame where object the cameras are pointed at are at the world origin (0,0,0). There's also a parameter called "volradius" which is the radius of this axis-aligned bounding box [-1,1]^3 in world space. In other words, it scales the coordinate frame down so that the object fits in [-1,1]^3.
To determine basetransf for your own data there's a few things you can try. The simplest would be to just average the camera positions and put it into the last column of basetransf with the identity matrix along the diagonal (i.e., [1., 0., 0., cam_average[0]], [0., 1., 0., cam_average[1]], [0., 0., 1., cam_average[2]]). Note that this will only work well if your cameras are located on a sphere and all pointed inward to the same point. In general, you want the last column of basetransf to be the object position in the camera coordinate frame. Let me know if this helps.
Thanks! It helps a lot, my cameras are located on a sphere with about 100+ views, and after I transl the scene/object to world origin (0,0,0), I find it not converge for some other reasons, I am checking for it.
Hi @stephenlombardi, I am still getting some troubles with your code apply at interhand dataset, could you help me? The cameras are located on a sphere with about 100+ views, after I set basetransf with the identity matrix along the diagonal as you said and the translation are one of the cameras to keep the cameras are pointed at the world origin (0,0,0).
And in the simplest cases I use the neural volumn setting rather than mvp settings.
However I find it converge so slow during training.
Here is the result of rendering after training 5000 epoch, with batchsize=16, volradius=512:
As you can see, the volumn is distributed everywhere rather than on a tight area, which I think is abnormal.
The images are 512*334
size, am I setting the right volradius=512
? Because I see the dryice1 data is 667*1024
size and setting the volradius=256
.
Oh, It's is training normally for nv setting right now after setting the right camera params. Closed as marked.
Oh, It's is training normally for nv setting right now after setting the right camera params. Closed as marked.
hello, i have a same problem with u, could u tell me which camera parameters u have set that make everything right?
thanks very much!
Hi, I have a little question when I apply my own data using this code. What's the
self.basetransf
matrix used for as inmultiviewvideo.py
do? I find this 3x4 matrix is applied for all camera poses and all theframetransf
, but wht's the purpose for it? :)https://github.com/facebookresearch/mvp/blob/d758f53662e79d7fec885f4dd1a3ee457f7c4b00/data/multiviewvideo.py#L410-L415
https://github.com/facebookresearch/mvp/blob/d758f53662e79d7fec885f4dd1a3ee457f7c4b00/data/multiviewvideo.py#L385-L387
Besides, I find it necessary to apply this basetransf, because when I change it to an eyes matrix, it didn't converge during training. So how to get my own
basetransf
?Your answer will help me a lot! Thank you!