CSE embedding for other object categories

facebookresearch / banmo

BANMo Building Animatable 3D Neural Models from Many Casual Videos

Other

538 stars 58 forks source link

CSE embedding for other object categories #26

Closed zc-alexfan closed 2 years ago

zc-alexfan commented 2 years ago

Hi, thanks for sharing the code. I really like your work. I want to try BANMO on objects that are neither human nor four-legged animals but it seems like BANMO assumes that. Would the results be degenerative if the CSE embeddings are not pre-trained. I noticed Tab. 5 mentioned "pre-trained embedding" and said the pre-training is not too important as long as the initial pose is good?

gengshan-y commented 2 years ago

Hi, based on the results on a few categories we tried (e.g., penguin, laikago robot, eagle, hands, results on this page), densepose is not that important given good enough initial root pose.

Also note those cases do not contain challenging self-occlusions caused by articulations. I think some form of correspondence like densepose-cse is still needed for more challenging cases.

zc-alexfan commented 2 years ago

Thanks for the quick response!

One last question: How well does the method perform when part of the video contains heavy occlusion? For example, you have a lot of frames in which the object is occluded a little but in some frames the object is heavily occluded. I am curious in how the method handles occlusion in those heavily occluded frames.

gengshan-y commented 2 years ago

There are two cases I can think of. For one, if the target is occluded by some other objects/background, the reconstructed shape will be (mistakenly) penalized by the silhouette reconstruction loss due to the wrong off-the-shelf segmentations. In this case, banmo would likely fail.

In the second case, if the object is out of the image boundary, banmo should work, since out-of-frame pixels are not sampled to compute losses. However in practice, this may still produce worse results since off-the-shelf segmentation method is usually not robust to partially out-of-frame objects.