facebookresearch / banmo

BANMo Building Animatable 3D Neural Models from Many Casual Videos
Other
532 stars 59 forks source link

Questions about output results #73

Closed hibt29 closed 5 months ago

hibt29 commented 10 months ago

Hi! Thank you for the great work. I have a question on the contents of the output npy files.

For example, the contents of the [DAVIS/Pixels/sequence-name/1_00000/0000.npy] file contain data named A (see Figure 1&2). Is it correct to interpret this as image, mask, visualization to 2D, Optical Flow, Occlusion, Depth, depth feat resampling respectively?

I would like to use the output results and the input image to obtain "the position of the camera, the distance between the object and the camera, and the position of each part obtained from Bone". Is it possible to obtain these using npy files or text files? Thank you.

figure_1 Figure 1: Contents of DAVIS/Pixels/sequence-name/1_00000/0000.npy figure_2 Figure 2: Contents of DAVIS/Pixels/sequence-name/1_00000/0000.npy

gengshan-y commented 10 months ago

Hi, please note that [DAVIS/Pixels/sequence-name/1_00000/0000.npy] etc are pre-processed input data, not the output.

To get cameras, parts, etc, from a checkpoint, you might want to check the "pre-optimized models" in the readme, specifically

bash scripts/render_mgpu.sh 0 $seqname tmp/cat-pikachiu.pth "0 5" 64

will extract camera parameters and bone locations and save them in the log directory.

hibt29 commented 10 months ago

Thank you for your quick response! I will try the points you have noted.

In addition, I would like to ask three more questions. ・What does the file "$seqname-{0}-bne-ctrajs-00000.txt" in logdir/$seqname-e120-b256-ft2 indicate about the 4x4 number? (I think maybe it's the trajectory of the bone, but I don't know how to read the data...) (see Figure 1) ・Also, please tell me about the 4x4 number in the txt file $sequence-00000.txt in init-cam. (see Figure 2) Thank you for your continuous support. ・Is it possible to determine which camera corresponds to which input video for each camera that comes up in "mesh_cam-XXX.obj"?

image Figure 1: Contents of logdir/$seqname-e120-b256-ft2/$seqname-{0}-bne-ctrajs-00000.txt

image Figure 2: Contents of logdir/$seqname-e120-b256-ft2/init-cam/$seqname-00002.txt

gengshan-y commented 10 months ago
hibt29 commented 10 months ago

Thank you for your detailed answer. I have a question about bone. Is it possible to get the coordinates corresponding to the input image from bone data? ( ex: I want to get an image corresponding to a bone as shown in this figure)(See Figure 1) I appreciate you answering so many times. dog-{2}-bne-mrender02000020 Figure 1: Figure of ear parts circled in red with rectangles

gengshan-y commented 9 months ago

Could you elaborate on the question? What do you mean by getting an image corresponding to a bone?

hibt29 commented 9 months ago

I'm so sorry... I have an input image dog001_00150.jpg and I want to extract a picture of the right ear of a dog in the dog001_00150.jpg image. I would like to extract the picture of the right ear in dog001_00150.jpg using the coordinates of the bone's right ear.

For example, is it possible to get the coordinates of each bone (ear coordinates, tail coordinates and etc...) in dog-{0}-bne-mrender000.jpg?

gengshan-y commented 9 months ago

No worries, unfortunately, there is no way to extract 2D coordinates of bones out of the box.

There is a hacky way to do it and it requires some knowledge of coordinate transforms.

This gives you the 3D bones in the format of (center, orientation, scale), as Bx10 matrix. You need to grab the first 3 columns to get the centers of bones, a Bx3 matrix, let's call it X. Then project X to 2D.

The projection formula is x = K (R X + t), where K, R, t can be obtained from rtk.

rtk is in Nx4x4, where each 4x4 is in the form of

[[R_3x3, t_3x1]
[fx, fy, cx, cy]]
hibt29 commented 9 months ago

Thanks for your reply!! I will try it. I will let you know if I have any questions!