DLR-RM / AugmentedAutoencoder

Official Code: Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
MIT License
338 stars 97 forks source link

[question] Report of evaluating AAE with ground truth bounding boxes is weird #61

Closed DateBro closed 4 years ago

DateBro commented 4 years ago

Describe the bug I installed some libraries about latex and try to generate report.pdf of evaluating AAE with ground truth bounding boxes, but the result is weird. I retrained an Augmented Autoencoder, some of their results are similar, while others are a little different. Their translation error is far large than your report in doc directory. @MartinSmeyer Do you know where is the problem and what should I do?

System Info Describe the characteristic of your environment:

The configuration of the two reports is the same. Train config:

[Paths]
MODEL_PATH: /home/zhiyong/server/data/t-less/t-less_v2/models/models_reconst/obj_05.ply
BACKGROUND_IMAGES_GLOB: /home/zhiyong/server/data/VOCdevkit/VOC2012/JPEGImages/*.jpg

[Dataset]
MODEL: reconst
H: 128
W: 128
C: 3
RADIUS: 700
RENDER_DIMS: (720, 540)
K: [1075.65, 0, 720/2, 0, 1073.90, 540/2, 0, 0, 1]
# Scale vertices to mm
VERTEX_SCALE: 1
ANTIALIASING: 1
PAD_FACTOR: 1.2
CLIP_NEAR: 10
CLIP_FAR: 10000
NOOF_TRAINING_IMGS: 20000
NOOF_BG_IMGS: 15000

[Augmentation]
REALISTIC_OCCLUSION: False
SQUARE_OCCLUSION: False
MAX_REL_OFFSET: 0.20
CODE: Sequential([
    #Sometimes(0.5, PerspectiveTransform(0.05)),
    #Sometimes(0.5, CropAndPad(percent=(-0.05, 0.1))),
    Sometimes(0.5, Affine(scale=(1.0, 1.2))),
    Sometimes(0.5, CoarseDropout( p=0.2, size_percent=0.05) ),
    Sometimes(0.5, GaussianBlur(1.2*np.random.rand())),
    Sometimes(0.5, Add((-25, 25), per_channel=0.3)),
    Sometimes(0.3, Invert(0.2, per_channel=True)),
    Sometimes(0.5, Multiply((0.6, 1.4), per_channel=0.5)),
    Sometimes(0.5, Multiply((0.6, 1.4))),
    Sometimes(0.5, ContrastNormalization((0.5, 2.2), per_channel=0.3))
    ], random_order=False)

[Embedding]
EMBED_BB: True
MIN_N_VIEWS: 2562
NUM_CYCLO: 36

[Network]
BATCH_NORMALIZATION: False
AUXILIARY_MASK: False
VARIATIONAL: 0
LOSS: L2
BOOTSTRAP_RATIO: 4
NORM_REGULARIZE: 0
LATENT_SPACE_SIZE: 128
NUM_FILTER: [128, 256, 512, 512]
STRIDES: [2, 2, 2, 2]
KERNEL_SIZE_ENCODER: 5
KERNEL_SIZE_DECODER: 5

[Training]
OPTIMIZER: Adam
NUM_ITER: 30000
BATCH_SIZE: 64
LEARNING_RATE: 2e-4
SAVE_INTERVAL: 10000

[Queue]
# OPENGL_RENDER_QUEUE_SIZE: 500
NUM_THREADS: 10
QUEUE_SIZE: 50

Evaluate config:

[METHOD]
method = ae

[DATA]
dataset = tless
cam_type = primesense
scenes = [] #empty means all scenes
obj_id = 5

[BBOXES]
estimate_bbs = False
ckpt = /path/to/detector/checkpoints/freezed_graph.pb
external = /home/zhiyong/server/data/t-less/t-less_v2/test_primesense/test_predicts
pad_factor = 1.2
single_instance = True

[EVALUATION]
icp = False
compute_errors = False
evaluate_errors = False
top_n_eval = 0

[METRIC]
error_thres = {'vsd':0.3,'cou':0.5,'te':5.0,'re':5.0}
error_thresh_fact = {'add':0.1,'adi':0.1}
error_type = ['vsd','re','te']
top_n = 1
vsd_delta = 15
vsd_tau = 20
vsd_cost = step

[PLOT]
nearest_neighbors = False
scene_with_estimate = False
reconstruction = False
cum_t_error_hist = True
cum_r_error_hist = True
cum_vsd_error_hist = True
vsd_occlusion = True
r_error_occlusion = True
embedding_pca = True
animate_embedding_pca = False
viewsphere = True
reconstruction_test_batch = True

Additional context The translation error of the first report: image

The rotation error of the first report: image

The PCA embedding view sphere of the first report is: image

The Recall vs translation error of the two reports are similar: image

The translation error of the second report: image

The rotation error of the second report: image

The PCA embedding view sphere of the second report is: image

DateBro commented 4 years ago

The report generated using retinanet is also weird. Its config is almost the same as the former config, except estimate bbs = True. During evaluating the AAE with retinanet, it printed a lot of 'no detection'. At first, I thought it was because some images don't contain obj_5. However, I found that the mAP of retinanet is lower than that in your paper, which is also mentioned in #60. I'd appreciate it if you could give some tips about the evaluation. 🙏

Translation Error Histogram: image

Rotation Error Histogram: image

Embedding PCA view sphere: image

DateBro commented 4 years ago

I found that I missed a configuration in train.config. After I added the configuration and retrained the AAE, the result is almost the as the former.

missing configuration:

ANTIALIASING: 8

CROP OFFSET SIGMA: 20

Translation and Rotation error is similar to the former. The PCA embedding view sphere is similar to your report in the docs directory. image

I contacted with @qyz55 in issue #51. He told me he just followed the README and got a result similar to your paper. I redownloaded the repository and retrained the AAE but didn't get a satisfactory result. Could you share your training and evaluating configuration of obj5 or tell me something that may not be specifically listed in the readme? 🙏

qyz55 commented 4 years ago

I trained on obj_05 twice, with no changes to train_template.cfg except the model file, and got 60/66 vsd with gt bbox. Maybe you need to delete all the files in $AE_WORKSPACE_PATH/tmp_datasets to generate new images for training. Otherwise no differences can be seen by repetitive training.

MartinSmeyer commented 4 years ago

I just glanced over it, but I think the configs seem ok and the translation errors without the bounding box detector also seem similar to the paper?

DateBro commented 4 years ago

Sorry for not reading the paper of the arxiv version, which added some results such as translation error and rotation error. Besides, I forgot that the report in the docs directory is the result of ICP. The improvement of translation error which is generated by ICP is impressive. Thanks for your help!