How to customize a template config file for real-time 6d object detection?

AminSeffo commented 2 years ago

Hey everyone,

First of all, I would like to thank you again for this great repo and especially @MartinSmeyer for his contribution. I am using the AAE pipeline alongside Mask R-CNN as a 2D object detection step provided by Detectron2.

In order to run a real-time 6d object detection system, I got a revealing glimpse into test_m3.py but I don't know how to configure my trained model from _exp_group/myautoencoder in m3_template.cfg.

How can I include the config file from the trained .ply model in the test_m3.py or m3_template.cfg files and which parameters should be adapted?
How can I make use of the predicted mask in the AAE pipeline, which can be easily obtained from detectron2 as a bitmask (RLE or polygon - COCO format )?

Kind regards

MartinSmeyer commented 2 years ago

Hi @AminSeffo,

Sounds good! :)

The training config will be automatically retrieved by the exp_group/my_autoencoder you define in the m3_template.cfg. So in particular here, where you define the mapping from class to the name and group of your AAE model: https://github.com/DLR-RM/AugmentedAutoencoder/blob/fec781cf4f7b0cc782da6d66218d09ad72cad6da/auto_pose/cfg_m3vision/m3_template.cfg#L10
It's likely you can keep the default parameters, just make sure your input images are in bgr format. The maskrcnn parameters are irrelevant because you are using your own one.
You can simply convert them to a binary mask and apply the mask to the image before cropping and prediction, as done here: https://github.com/DLR-RM/AugmentedAutoencoder/blob/fec781cf4f7b0cc782da6d66218d09ad72cad6da/auto_pose/m3_interface/compute_bop_results_m3.py#L164 I once wrote a converter from RLE to binary masks: https://github.com/thodan/bop_toolkit/blob/af380d7a028b5c44903913e39d652c83a4bc2bdd/bop_toolkit_lib/pycoco_utils.py#L202

AminSeffo commented 2 years ago

Hi @MartinSmeyer,

thanks again for your reply. I changed the parameter of _class_2encoder (see below) and I defined it as class 1 in the python script, but I still get this error: 1 not contained in config class_names dict_keys([1]

By the way, I trained the model as in the AAE pipeline instructions, where I only changed the path of the .ply and the VOC background images.

Additional information

[methods]
object_detector = mask_rcnn
object_pose_estimator = auto_pose

[auto_pose]
gpu_memory_fraction = 0.5
color_format = bgr
color_data_type = np.float32
depth_data_type = np.float32
class_2_encoder = {1:"exp_group/my_autoencoder"}
camPose = False
upright = False
topk = 1
pose_visualization = False

[mask_rcnn]
path_to_masks =
inference_time = 0.15

# from test_m3.py
# gt boxes and classes (replace with your favorite detector)

classes =  [1]
bboxes = [[860, 511, 929, 667]]

MartinSmeyer commented 2 years ago

@AminSeffo So you did not rename your experiment group / name when training the AAE? Normally you would put some descriptive names there, but it shouldn't matter.

The problem might be that here the class key is transformed into a string: https://github.com/DLR-RM/AugmentedAutoencoder/blob/fec781cf4f7b0cc782da6d66218d09ad72cad6da/auto_pose/m3_interface/test_m3.py#L39

Can you try to remove the str()?

AminSeffo commented 2 years ago

hello，Could you tell me how train 2D detector? Thank you so much~

I used Detectron2 for that. Here is a colab notebook, where you can define a data set and start the training.

AminSeffo commented 2 years ago

hello，Could you tell me how train 2D detector? Thank you so much~

I used Detectron2 for that. Here is a colab notebook, where you can define a data set and start the training. hello ,I want to know how to train 2D detector with tless train-dataset? if you could reply me ,i will appreciate！

Hey maybe I can help you doing that but can you please open a new issue for that?

AminSeffo commented 2 years ago

Hey @MartinSmeyer, thank you again. I removed str() and it works, but I am not able to visualize it using the -vis flag because of this error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 0: invalid start byte I saw this issue https://github.com/DLR-RM/AugmentedAutoencoder/issues/88, however the training worked already.

AminSeffo commented 2 years ago

@MartinSmeyer

maybe here the traceback error in details, when running with -vis :

This message will be only logged once.
INFO - 2022-08-01 16:25:09,531 - acceleratesupport - OpenGL_accelerate module loaded
INFO - 2022-08-01 16:25:09,536 - arraydatatype - Using accelerated ArrayDatatype
using egl
('renderer', 'Model paths: ', ['/home/amin/autoencoder_ws/cad_model/nuss_model.ply'])
[0]
Traceback (most recent call last):                                                                                                                                                                                                                                              |
  File "/home/amin/6d_pose_estimation/test_m3.py", line 60, in <module>
    pose_visualizer.render_poses(img, camK, pose_ests, bbs)
  File "/home/amin/6d_pose_estimation/visualization/render_pose.py", line 31, in render_poses
    bgr, depth,_ = self.renderer.render_many(obj_ids = [self.classes.index(pose_est.name) for pose_est in pose_ests],
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/auto_pose/ae/utils.py", line 15, in decorator
    setattr(self, attribute, function(self))
  File "/home/amin/6d_pose_estimation/visualization/render_pose.py", line 25, in renderer
    vertex_scale=float(self.vertex_scale[0])) #1000 for models in meters
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/auto_pose/meshrenderer/meshrenderer.py", line 37, in __init__
    vert_norms = gu.geo.load_meshes(models_cad_files, vertex_tmp_store_folder, recalculate_normals=True)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/auto_pose/meshrenderer/gl_utils/geometry.py", line 54, in load_meshes
    scene = pyassimp.load(model_path, pyassimp.postprocess.aiProcess_Triangulate)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/pyassimp/core.py", line 315, in load
    scene = _init(model.contents)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/pyassimp/core.py", line 211, in _init
    call_init(obj, target)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/pyassimp/core.py", line 76, in call_init
    _init(obj.contents, obj, caller)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/pyassimp/core.py", line 122, in _init
    target.name = str(obj.data.decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte

MartinSmeyer commented 2 years ago

There seem to be some special characters in your 3D model file. Please try to debug this yourself.

AminSeffo commented 2 years ago

Hey @MartinSmeyer, I checked that out. I replaced my 3D model with obj_30.ply from the BOP challenge and I am still getting the same error. And I can still try to fix this error. thank you again

MartinSmeyer commented 2 years ago

That's strange, I just tried it and it works for me, including tless object 30: Bildschirmfoto_2022-08-03_15-05-23

Are you using the CAD models or the reconstructed ones? Could you try to change the model type to CAD in your training config before running the visualization?

[Dataset]
MODEL: cad

MartinSmeyer commented 2 years ago

Which pyassimp version are you using?

AminSeffo commented 2 years ago

Which pyassimp version are you using?

pyassimp: 3.3

MartinSmeyer commented 2 years ago

Try to update to

https://github.com/DLR-RM/AugmentedAutoencoder/blob/fec781cf4f7b0cc782da6d66218d09ad72cad6da/aae_py37_tf26.yml#L132

AminSeffo commented 2 years ago

Try to update to

https://github.com/DLR-RM/AugmentedAutoencoder/blob/fec781cf4f7b0cc782da6d66218d09ad72cad6da/aae_py37_tf26.yml#L132

I updated but I got this error now :(


('renderer', 'Model paths: ', ['/home/amin/autoencoder_ws/cad_model/nuss_model.ply'])
[0]
100% |########################################################################################################################################################################################################################################################################################|
Traceback (most recent call last):
  File "/home/amin/6d_pose_estimation/test_m3.py", line 64, in <module>
    pose_visualizer.render_poses(img, camK, pose_ests, bbs)
  File "/home/amin/6d_pose_estimation/visualization/render_pose.py", line 38, in render_poses
    far = 10000)
  File "/home/amin/anaconda3/envs/sixd_pose_detection/lib/python3.7/site-packages/auto_pose/meshrenderer/meshrenderer.py", line 141, in render_many
    assert W <= Renderer.MAX_FBO_WIDTH and H <= Renderer.MAX_FBO_HEIGHT
AssertionError

AminSeffo commented 2 years ago

I think, I have problems with my image dimensions...I will check it out and let you

AminSeffo commented 2 years ago

Hey @MartinSmeyer

I had problems with some dimensions of the bbox from the 2d object detection and now the rendering works, here is the output :

test_image

I think, I am facing an issue with scaling. Here is my .ply model my_model_aae.zip , which I scaled in blender. Actually I should have done everything correctly before training the autoencoder (please take a look at the .ply model if you have time)

Here is also a snap of _myautoencoder.cfg :

[Paths]
MODEL_PATH: /home/amin/autoencoder_ws/cad_model/my_model_aae.ply
BACKGROUND_IMAGES_GLOB: /home/amin/autoencoder_ws/VOCdevkit/VOC2012/JPEGImages/*.jpg

[Dataset]
MODEL: reconst
H: 128
W: 128
C: 3
RADIUS: 700

RENDER_DIMS: (720, 540)
K: [1075.65, 0, 720/2, 0, 1073.90, 540/2, 0, 0, 1]

#Azure Kinect parameters
#RENDER_DIMS: (720, 1280)
#K: [608.1231079101562, 0, 638.6071166992188, 0, 608.0382690429688, 368.2049560546875, 0, 0, 1]

# Scale vertices to mm
VERTEX_SCALE: 1
ANTIALIASING: 1
PAD_FACTOR: 1.
CLIP_NEAR: 10
CLIP_FAR: 10000
NOOF_TRAINING_IMGS: 20000
NOOF_BG_IMGS: 15000

[Augmentation]
REALISTIC_OCCLUSION: False
SQUARE_OCCLUSION: False
MAX_REL_OFFSET: 0.20
CODE: Sequential([
    #Sometimes(0.5, PerspectiveTransform(0.05)),
    #Sometimes(0.5, CropAndPad(percent=(-0.05, 0.1))),
    Sometimes(0.5, Affine(scale=(1.0, 1.2))),
    Sometimes(0.5, CoarseDropout( p=0.2, size_percent=0.05) ),
    Sometimes(0.5, GaussianBlur(1.2*np.random.rand())),
    Sometimes(0.5, Add((-25, 25), per_channel=0.3)),
    Sometimes(0.3, Invert(0.2, per_channel=True)),
    Sometimes(0.5, Multiply((0.6, 1.4), per_channel=0.5)),
    Sometimes(0.5, Multiply((0.6, 1.4))),
    Sometimes(0.5, ContrastNormalization((0.5, 2.2), per_channel=0.3))
    ], random_order=False)

[Embedding]
EMBED_BB: True
MIN_N_VIEWS: 2562
NUM_CYCLO: 36

[Network]
BATCH_NORMALIZATION: False
AUXILIARY_MASK: False
VARIATIONAL: 0
LOSS: L2
BOOTSTRAP_RATIO: 4
NORM_REGULARIZE: 0
LATENT_SPACE_SIZE: 128
NUM_FILTER: [128, 256, 512, 512]
STRIDES: [2, 2, 2, 2]
KERNEL_SIZE_ENCODER: 5
KERNEL_SIZE_DECODER: 5

[Training]
OPTIMIZER: Adam
NUM_ITER: 30000
BATCH_SIZE: 64
LEARNING_RATE: 2e-4
SAVE_INTERVAL: 10000

[Queue]
# OPENGL_RENDER_QUEUE_SIZE: 500
NUM_THREADS: 10
QUEUE_SIZE: 50

And from _testm3.py :

import cv2
import numpy as np
import os
import argparse
import object_detector
from auto_pose.m3_interface.m3_interfaces import BoundingBox
from auto_pose.m3_interface.ae_pose_estimator import AePoseEstimator
from webcam_video_stream import WebcamVideoStream

dir_name = os.path.dirname(os.path.abspath(__file__))
default_cfg = os.path.join(dir_name, '../cfg_m3vision/m3_template_pose.cfg')

parser = argparse.ArgumentParser()
parser.add_argument("--m3_config_path", type=str, default=default_cfg)
parser.add_argument("-vis", action='store_true', default=False)

args = parser.parse_args()

if os.environ.get('AE_WORKSPACE_PATH') == None:
    print('Please define a workspace path:\n')
    print('export AE_WORKSPACE_PATH=/path/to/workspace\n')
    exit(-1)

img = cv2.imread("/home/amin/6d_pose_estimation/image_test.png")
H,W,_ = img.shape

# Azure Kinect camera parameters
f_x=608.1231079101562
f_y=608.0382690429688
c_x=638.6071166992188
c_y=368.2049560546875
camK = np.array([f_x, 0., c_x, 0., f_y, c_y, 0., 0., 1.]).reshape(3, 3)

# gt boxes and classes (replace with your favorite detector)
classes =  [1]

bboxes=[[723, 366, 89, 80]]
my_detector=object_detector.Detector()
nuss_detection=my_detector.prediction(img)
bbs = []
h,w = float(H), float(W)
for b,c in zip(bboxes, classes):
    bbs.append(BoundingBox(xmin=b[0]/w, xmax=(b[0]+b[2])/w , ymin=b[1]/h, ymax=(b[1]+b[3])/h, classes={(c):1.0}))
    # MultiPath Encoder Initialization
    aae_pose_estimator = AePoseEstimator("/home/amin/6d_pose_estimation/cfg_m3vision/m3_template_pose.cfg")
    # Predict 6-DoF poses
    pose_ests = aae_pose_estimator.process(bbs,img,camK)
    print(np.array([{p.name:p.trafo} for p in pose_ests]))
# Visualize
if args.vis:
    from visualization.render_pose import PoseVisualizer
    pose_visualizer = PoseVisualizer(aae_pose_estimator)
    pose_visualizer.render_poses(img, camK, pose_ests, bbs)

And finally the used _m3template:

[methods]
object_detector = mask_rcnn
object_pose_estimator = auto_pose

[auto_pose]
gpu_memory_fraction = 0.5
color_format = bgr
color_data_type = np.float32
depth_data_type = np.float32
class_2_encoder = {1:"exp_group/my_autoencoder"}
camPose = False
upright = False
topk = 1
pose_visualization = False

[mask_rcnn]
path_to_masks =
inference_time = 0.15

AminSeffo commented 2 years ago

Hey @MartinSmeyer, do you have any suggested solutions?

MartinSmeyer commented 2 years ago

Hey,

#Azure Kinect parameters
#RENDER_DIMS: (720, 1280)
#K: [608.1231079101562, 0, 638.6071166992188, 0, 608.0382690429688, 368.2049560546875, 0, 0, 1]

It's best to use these for training, but render dims is the wrong way around, should be

RENDER_DIMS: (1280, 720)

The pad factor of 1.2 should not be changed.

The 3D model geometry seems okay at first glance. Try again to train with the above parameters and use the ae_train .. -d option to visualize the reconstruction targets before training.

AminSeffo commented 2 years ago

Thank you again @MartinSmeyer

I corrected the render dims the K matrix as you suggested but I am still getting the same visualization. Moreover, I centered the model using meshlab and it looks now: Screenshot from 2022-08-16 10-36-08

MartinSmeyer commented 2 years ago

Can you please post the image you get with ae_train ... -d

AminSeffo commented 2 years ago

@MartinSmeyer of course Here are the images, which are generated using ae_train ... -d and the centered model using meshlab:

Screenshot from 2022-08-16 10-52-33

MartinSmeyer commented 2 years ago

Okay, although the 3D model is hollow and without texture, the size looks alright.

What is the pose that you print out?

MartinSmeyer commented 2 years ago

and did you retrain with the azure kinect camK? and recreated the embedding?

MartinSmeyer commented 2 years ago

Shouldn't this classes={(c):1.0}) be this classes={str(c):1.0})?

AminSeffo commented 2 years ago

sry I closed the issue by mistake

AminSeffo commented 2 years ago

and did you retrain with the azure kinect camK? and recreated the embedding?

Yes I did

AminSeffo commented 2 years ago

Shouldn't this classes={(c):1.0}) be this classes={str(c):1.0})?

With str I got some errors, we discussed that before: https://github.com/DLR-RM/AugmentedAutoencoder/issues/113#issuecomment-1200977600

MartinSmeyer commented 2 years ago

With str I got some errors, we discussed that before: https://github.com/DLR-RM/AugmentedAutoencoder/issues/113#issuecomment-1200977600

Ah yes. Would just also remove the ()

AminSeffo commented 2 years ago

With str I got some errors, we discussed that before: #113 (comment)

Ah yes. Would just also remove the ()

Ohh okey I removed it: bbs.append(BoundingBox(xmin=b[0]/w, xmax=(b[0]+b[2])/w , ymin=b[1]/h, ymax=(b[1]+b[3])/h, classes={c:1.0}))

AminSeffo commented 2 years ago

Okay, although the 3D model is hollow and without texture, the size looks alright.

What is the pose that you print out?

MartinSmeyer commented 2 years ago

Oh it's in meters, although your 3d model is in mm. Try to add mm=True as an argument here:

https://github.com/DLR-RM/AugmentedAutoencoder/blob/fec781cf4f7b0cc782da6d66218d09ad72cad6da/auto_pose/m3_interface/test_m3.py#L45

AminSeffo commented 2 years ago

Oh it's in meters, although your 3d model is in mm. Try to add mm=True as an argument here:

https://github.com/DLR-RM/AugmentedAutoencoder/blob/fec781cf4f7b0cc782da6d66218d09ad72cad6da/auto_pose/m3_interface/test_m3.py#L45

Okey thanks it looks better: Screenshot from 2022-08-16 11-35-04

but where is the translation vector?

MartinSmeyer commented 2 years ago

It's a 4x4 homogeneous matrix. ;) t = [149.27 , 45.84, 687.40] in mm

Albertdalmen commented 1 year ago

Hi @AminSeffo, I'm glad that someone else is interested in using this as a real-time pose estimator. I'm currently trying to implement my own.

How where your results? I'm also curious about why you choose to go for detectron2 instead of, for instance, Keras Retitnanet.

Is there, by any change, the possibility that you share your work/pipeline, or some indications on how you manage to make it work?

Thanks in advance.

DLR-RM / AugmentedAutoencoder

How to customize a template config file for real-time 6d object detection? #113