DLR-RM / AugmentedAutoencoder

Official Code: Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
MIT License
336 stars 97 forks source link

Weird results on these objects #87

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hello, I read from your paper that AAE fails for long and thin objects, but I have strange results also in these types of objects.

frame0_poseAAE frame0_poseAAE

Is it possible to use AAE on these kind of objects or nothing? Thanks in advance

MartinSmeyer commented 3 years ago

Hi @Zaxorn , even though it can be hard, I have had better results on thin objects. Can you post the training_images (in a folder near your checkpoint)? It can happen that thin objects are not reconstructed well. In this case you can increase the BOOTSTRAP_RATIO parameter in the config to something like 16 and retrain

ghost commented 3 years ago

training_images_39999

Here is the training images setting the BOOTSTRAP_RATIO to 16 (I don't know why the embedding is so wrong).

This is the model.ply I am using for training but it looks like fine, like the others I have already succesfully used cifarelli_1.zip

MartinSmeyer commented 3 years ago

after how many iterations? I suppose this is at the very beginning.

MartinSmeyer commented 3 years ago

You can also remove the occlusion augmentation to facilitate convergence. But it should work anyways.. What batch size did you use?

ghost commented 3 years ago

50000 iterations with a batch size of 8.

Here is the cfg file used for training. chiave_candela.zip

ghost commented 3 years ago

The picture is of the last iteration.

MartinSmeyer commented 3 years ago

You need to set the BOOTSTRAP_RATIO to 16 and retrain the networks from scratch. If you only use batch size 8 you might need to decrease the learning rate as well.

ghost commented 3 years ago

Results look better on this object frame0_poseAAE

but still wrong on the other one even if training looks good training_images_49999

Do I need to further increase the BOOTSTRAP_RATIO? Anyway what does this parameter do?

MartinSmeyer commented 3 years ago

The bootstrap ratio refers to only calculating the reconstruction loss on the 1/bootstrap ratio pixels with the highest error. This forces the network not to just learn a constant background that is correct in most places.

Yes, you can further increase this ratio, but not too much. As I said the occlusion can hide important features in the input here so commenting out this line should help ' CoarseDropout( p=0.2, size_percent=0.05) ),'