Closed jinjuehui closed 5 years ago
Thank you! I should add some documentation to it.
Both scripts should work, "generate_syn_det_train.py" creates synthetic object views from the 3D models and "detection_utils/generate_sixd_train.py" uses a SIXD training set (e.g. from T-LESS) with single object views on black background. As stated in the paper, for T-LESS we use the primesense training set. Just change the paths inside detection_utils/generate_sixd_train.py to point to the T-LESS and VOC datasets and you should be fine.
@MartinSmeyer What should I do to train a detector on LINEMOD? I tried detection_utils/generate_sixd_train.py using the "train" images with single object views on black background from the SIXD challenge, and then trained a Yolov3 detector, however it failed to detect objects in real images from the testing test.
Do I need to make additional changes to prepare the training images for LINEMOD?
Yes, for LineMOD you have to additionally render views from the provided models using the generate_syn_det_train.py with random light sources, reflectance properties (phong model) and color augmentations. We also froze the first layers of the detector to not overfit to the synthetic data. You can try that with Yolov3 too.
@MartinSmeyer Could you share the script arguments for generating the syn_det data for LINDMOD? BTW: how many layers do you freeze when you trained the detectors?
I assume that it is possible to successfully train a detector without using real data provided by LineMOD, right?
Yes, it is! Of course, results are weaker than training with real images. For more insights and quantitative evaluation on object detection with synthetic data I recommend e.g. this paper: https://arxiv.org/pdf/1902.03334v1.pdf I checked and for RetinaNet we actually froze the whole backbone, there is an option for it in the open source code. For the dataset generation, to be honest, I don't have all the parameters that we used for the final training as we changed the code quite frequently and it was not the focus of the paper. But please pull the latest version and try the default using this command:
python generate_syn_det_train.py \
--output_path=/path/to/det_train_data \
--model=/path/to/LineMOD/models/ \
--num=60000 \
--scale=1 \
--vocpath=/path/to/VOCdevkit/VOC2012/JPEGImages \
--model_type=reconst
Exclude the bowl and the cup from the training models.
@MartinSmeyer Another question, for training the detector of LineMOD, do you use the data generated by both generate_syn_det_train.py
and generate_sixd_train.py
or only generate_syn_det_train.py
?
Additionally, so both datasets. Also, for training you might need early stopping to avoid overfitting to the synthetic data.
Thanks a lot for the nice work, but there's some information missing related to generating training data for ssd, hope followings will save time for others who are also interested in this work:
for generating training data for ssd, instead using "detection_utils/generate_sixd_train.py", "generate_syn_det_train.py" should be used. And the commanline configuration looks as follows:
after running, xml annotations and training images will be generated nicely.