DLR-RM / AugmentedAutoencoder

Official Code: Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
MIT License
340 stars 97 forks source link

Number of sampled views for T-LESS for AAE and MP-Encoder #84

Closed fylwen closed 4 years ago

fylwen commented 4 years ago

Hi!

Compared with the previous Implicit 3D Orientation Learning for 6D Object Detection from RGB Images which takes 20,000 sampled views on SO(3) to train the AugmentedAutoencoder for a T-LESS model, here in the Multi-path Learning for Object Pose Estimation Across Domains, it seems that the number of sampled views for each T-LESS mesh becomes 8,000.

Could you help explain the motivation behind this difference? Why not train the multi-path Encoder with 20,000 sampled views per mesh?

Thanks!

MartinSmeyer commented 4 years ago

Hi @fylwen,

good question! Since we are generating and caching the sampled views before training, the motivation was purely based on memory constraints. In the MP Encoder you need to generate the views for every obejct, while in the AAE you only need to create them for one object. In my experiments, I could reduce the number of views per object without significant performance losses and the embedding visualizations still showed a clear viewpoint sensitivity. Intuitively, I think the larger number of objects also has a regularizing effect.