facebookresearch / mvp

Training and Evaluation Code for "Mixture of Volumetric Primitives for Efficient Neural Rendering"
Other
191 stars 16 forks source link

Hi, what role does modelartix and modelartixinv play? #10

Closed Luh1124 closed 1 year ago

stephenlombardi commented 1 year ago

modelmatrix is like a model matrix in OpenGL (for more info: https://jsantell.com/model-view-projection/ -- in this scheme, the view matrix and the projection matrix can be thought of as the camera extrinsics and intrinsics). The model matrix is the rigid transformation of the object in space. Because Mixture of Volumetric Primitives works on dynamic scenes, you can provide rigid tracking information (in the form of per-frame model matrices) to help the model learn in a canonical model space. If you don't have a dynamic scene or if you don't have per-frame tracking information, you can just make this the identity matrix. In the code it is primarily used to transform camera rays from world space to model space. modelmatrixinv is the inverse of this matrix.

Luh1124 commented 1 year ago

image Thanks,I still have some questions. What is the physical meaning of right multiplication of model matrix inverse?

Luh1124 commented 1 year ago

I want to confirm my understanding of spatial coordinate conversion in the code:

  1. In NV, we convert the camera's position coordinates to world space through the camera's external parameter matrix, and use baseposition.txt converts the world coordinates of the camera into the base position space (can I understand it as the model space or the average position of the camera?). In this way, we put the camera and the normalized model body in the basepose space, and then calculate the starting point, direction, Tmin and Tmax of rays.
  2. In MVP, first we use the inverse transformation of the matrix recorded in transform.txt saves the obj file as a bin file. Using meshlab, I find that the face orientation of each bin file is along the positive direction of the Z axis and located at the origin of the world coordinate system. At this time, do we think that the bin file is in the world center or model space? Second, the camera transformation follows NV. We use the camera external reference read from KRT to convert the camera to world space? Then use the transform.txt file of 400002 carmera as transform matrix of the baseposition space?And then convert the camera from world space to basepose space?According to code, does the modelmatrix perform the transformation from basepose to model space?
  3. Finally, I don't understand that basepose's work, since it seems that cameras in the world coordinate system can be converted to the model space directly through the transform.txt matrix corresponding to each obj, just like obj to bin? And, for the obj file, should we understand that its vertex coordinates are for world space?
Luh1124 commented 1 year ago

Perhaps basepose does not work for the multi-facet dataset used by MVP? We only need the respective transform corresponding to the obj file to map the obj to the center of the model space. When the camera calculates the view POS, it only needs to transform the camera after the external parameter matrix is applied? For general datasets,we may need to give an average position center as basepose? We look forward to your reply~

stephenlombardi commented 1 year ago

image Thanks,I still have some questions. What is the physical meaning of right multiplication of model matrix inverse?

This line takes campos/camrot which is the position and orientation of the camera in world space and converts it to model space

stephenlombardi commented 1 year ago

I want to confirm my understanding of spatial coordinate conversion in the code:

  1. In NV, we convert the camera's position coordinates to world space through the camera's external parameter matrix, and use baseposition.txt converts the world coordinates of the camera into the base position space (can I understand it as the model space or the average position of the camera?). In this way, we put the camera and the normalized model body in the basepose space, and then calculate the starting point, direction, Tmin and Tmax of rays.
  2. In MVP, first we use the inverse transformation of the matrix recorded in transform.txt saves the obj file as a bin file. Using meshlab, I find that the face orientation of each bin file is along the positive direction of the Z axis and located at the origin of the world coordinate system. At this time, do we think that the bin file is in the world center or model space? Second, the camera transformation follows NV. We use the camera external reference read from KRT to convert the camera to world space? Then use the transform.txt file of 400002 carmera as transform matrix of the baseposition space?And then convert the camera from world space to basepose space?According to code, does the modelmatrix perform the transformation from basepose to model space?
  3. Finally, I don't understand that basepose's work, since it seems that cameras in the world coordinate system can be converted to the model space directly through the transform.txt matrix corresponding to each obj, just like obj to bin? And, for the obj file, should we understand that its vertex coordinates are for world space?
  1. Basepose converts world space to model space
  2. the .bin vertices are in model space
  3. The modelmatrix is a per-frame model matrix relative to basepose, so if you don't have per-frame rigid transforms you can just leave it as identity but you still need basepose which transforms the coordinate frame so that the object is centered at [0, 0, 0] (The rotation doesn't matter as much)
stephenlombardi commented 1 year ago

Perhaps basepose does not work for the multi-facet dataset used by MVP? We only need the respective transform corresponding to the obj file to map the obj to the center of the model space. When the camera calculates the view POS, it only needs to transform the camera after the external parameter matrix is applied? For general datasets,we may need to give an average position center as basepose? We look forward to your reply~

For the multiface dataset it should work the same if you are training on one subject. You can simply pick any of the transform.txt from that subject as the basepose.

If you're training one model on multiple subjects then you just have to make sure you pick 1 basepose per subject (you'd need to modify the DataSet class a bit).

Luh1124 commented 1 year ago

Thanks, there should be no problem for the camera transform! I will close this issue