Questions about output results

hibt29 commented 10 months ago

Hi! Thank you for the great work. I have a question on the contents of the output npy files.

For example, the contents of the [DAVIS/Pixels/sequence-name/1_00000/0000.npy] file contain data named A (see Figure 1&2). Is it correct to interpret this as image, mask, visualization to 2D, Optical Flow, Occlusion, Depth, depth feat resampling respectively?

I would like to use the output results and the input image to obtain "the position of the camera, the distance between the object and the camera, and the position of each part obtained from Bone". Is it possible to obtain these using npy files or text files? Thank you.

figure_1 Figure 1: Contents of DAVIS/Pixels/sequence-name/1_00000/0000.npy figure_2 Figure 2: Contents of DAVIS/Pixels/sequence-name/1_00000/0000.npy

gengshan-y commented 10 months ago

Hi, please note that [DAVIS/Pixels/sequence-name/1_00000/0000.npy] etc are pre-processed input data, not the output.

To get cameras, parts, etc, from a checkpoint, you might want to check the "pre-optimized models" in the readme, specifically

bash scripts/render_mgpu.sh 0 $seqname tmp/cat-pikachiu.pth "0 5" 64

will extract camera parameters and bone locations and save them in the log directory.

hibt29 commented 10 months ago

Thank you for your quick response! I will try the points you have noted.

In addition, I would like to ask three more questions. ・What does the file "$seqname-{0}-bne-ctrajs-00000.txt" in logdir/$seqname-e120-b256-ft2 indicate about the 4x4 number? (I think maybe it's the trajectory of the bone, but I don't know how to read the data...) (see Figure 1) ・Also, please tell me about the 4x4 number in the txt file $sequence-00000.txt in init-cam. (see Figure 2) Thank you for your continuous support. ・Is it possible to determine which camera corresponds to which input video for each camera that comes up in "mesh_cam-XXX.obj"?

Figure 1: Contents of logdir/$seqname-e120-b256-ft2/$seqname-{0}-bne-ctrajs-00000.txt

Figure 2: Contents of logdir/$seqname-e120-b256-ft2/init-cam/$seqname-00002.txt

gengshan-y commented 10 months ago

ctrajs-%05d.txt stores 4x4 camera matrics, first 3 row as [R|T] that transforms points from object to camera, last row as [fx,fy,cx,cy]. Code is here.
$sequence-00000.txt stores camera [R|T] etc, same as previous point. The difference is this stores the initial camera before optimization.
Determining the video number is non-trivial although can be done. Mesh_cam-xxx.obj subsamples 9 frames (if I remember correctly) evenly from all cameras.

hibt29 commented 10 months ago

Thank you for your detailed answer. I have a question about bone. Is it possible to get the coordinates corresponding to the input image from bone data? ( ex: I want to get an image corresponding to a bone as shown in this figure)(See Figure 1) I appreciate you answering so many times. dog-{2}-bne-mrender020 00020 Figure 1: Figure of ear parts circled in red with rectangles

gengshan-y commented 9 months ago

Could you elaborate on the question? What do you mean by getting an image corresponding to a bone?

hibt29 commented 9 months ago

I'm so sorry... I have an input image dog001_00150.jpg and I want to extract a picture of the right ear of a dog in the dog001_00150.jpg image. I would like to extract the picture of the right ear in dog001_00150.jpg using the coordinates of the bone's right ear.

For example, is it possible to get the coordinates of each bone (ear coordinates, tail coordinates and etc...) in dog-{0}-bne-mrender000.jpg?

gengshan-y commented 9 months ago

No worries, unfortunately, there is no way to extract 2D coordinates of bones out of the box.

There is a hacky way to do it and it requires some knowledge of coordinate transforms.

This gives you the 3D bones in the format of (center, orientation, scale), as Bx10 matrix. You need to grab the first 3 columns to get the centers of bones, a Bx3 matrix, let's call it X. Then project X to 2D.

The projection formula is x = K (R X + t), where K, R, t can be obtained from rtk.

rtk is in Nx4x4, where each 4x4 is in the form of

[[R_3x3, t_3x1]
[fx, fy, cx, cy]]

hibt29 commented 9 months ago

Thanks for your reply!! I will try it. I will let you know if I have any questions!

facebookresearch / banmo

Questions about output results #73