DLR-RM / AugmentedAutoencoder

Official Code: Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
MIT License
336 stars 97 forks source link

Training parameters and other questions #92

Closed aaronsng closed 3 years ago

aaronsng commented 3 years ago

Hi, I've been using the AAE and I must say it is pretty remarkable algorithm that you have developed here. I've recently attempted to deploy the algorithm in a real world situation, to identify an object from a distance in an uncontrolled environment. Due to an NDA, I'm not allowed to disclose the CAD file and what is the object specifically, but I can say that it is something like the photo that follows.

image

The object is placed within 10 metres from the camera. After deploying and testing the AAE with a live video feed, the AAE is consistently unable to provide a steady orientation. I'm not sure if its the AAE inability to be deployed in an uncontrolled environment or is due to the misunderstanding of training parameters that I used. Hence, I would like to clarify the following parameters (apologies, I don't have much experience with OpenGL or CAD-ding in general):

  1. Vertex scale -> I understand that this is the scale of the vertices in the CAD file, but how will this affect inference? Specifically, in Blender, I went to change the scale of the vertices such that when I used the ruler tool, it would reflect the actual size of the object. Will this have any effect on the training parameters?
  2. Object Origin in the CAD file -> What's the difference between global and local origin and how should I set the origin in the object?
  3. What is the coordinate system of the rotation matrix based on, is it with respect to the camera or to the object?

Here are the notable changes I have made to the training parameters: Bootstrap Ratio: 8 Radius: 1000 Iterations: 50,000 Vertex Scale: 10.8 // As described above Batch Size: 64 After performing inference, I would take the inferred rotation matrix and convert it to it's equivalent quaternion representation, but it has occurred to me that by doing so, it might lose information such as the order of multiplication. I used the pysxid library that you included in the repo to convert it

MartinSmeyer commented 3 years ago

Thank you!

ae_train your_group/your_ae -d 

Should be clear and fully visible, otherwise you will get garbage-in garbage-out.

Otherwise, it looks alright and if the model has texture, it should work alright. At a distance of 10m an RGB image might not be sufficient to accurately estimate the distance so you might have a look into pose refinement methods using depth data like ICP. But if it is for navigation, it should be enough.

aaronsng commented 3 years ago

Thanks MartinSmeyer!

Just one last clarification. If your vertex scale is 1000, your radius will be in metres? Same applies with the value of 1, your radius will be in milimetres?

MartinSmeyer commented 3 years ago

radius will always be in mm!