Closed fylwen closed 4 years ago
Hi @fylwen,
good question! Since we are generating and caching the sampled views before training, the motivation was purely based on memory constraints. In the MP Encoder you need to generate the views for every obejct, while in the AAE you only need to create them for one object. In my experiments, I could reduce the number of views per object without significant performance losses and the embedding visualizations still showed a clear viewpoint sensitivity. Intuitively, I think the larger number of objects also has a regularizing effect.
Hi!
Compared with the previous Implicit 3D Orientation Learning for 6D Object Detection from RGB Images which takes 20,000 sampled views on SO(3) to train the AugmentedAutoencoder for a T-LESS model, here in the Multi-path Learning for Object Pose Estimation Across Domains, it seems that the number of sampled views for each T-LESS mesh becomes 8,000.
Could you help explain the motivation behind this difference? Why not train the multi-path Encoder with 20,000 sampled views per mesh?
Thanks!