chensong1995 / HybridPose

HybridPose: 6D Object Pose Estimation under Hybrid Representation (CVPR 2020)
MIT License
419 stars 64 forks source link

Does HybridPose train on OCCLUSION LINEMOD dataset? #32

Open sh8 opened 4 years ago

sh8 commented 4 years ago

Hi, I have a question about the training strategy of HybridPose.

After reading your codes, I found that the network is also trained on the OCCLUSION LineMOD dataset. However, prior methods such as PVNet and PoseCNN use the LineMOD dataset for training and the OCCLUSION LineMOD dataset for only testing.

I believe that the training on the OCCLUSION LineMOD dataset can cause an unfair comparison. What do you think about this point?

chensong1995 commented 4 years ago

Hello Shun Iwase,

Thanks for your question!

HyrbridPose is indeed trained on Occlusion Linemod dataset, which is inconsistent with the convention in this community. When we wrote the paper, we were unaware of the convention of using Occlusion Linemod for testing only. This makes some of our comparisons controversial.

When we first downloaded the Linemod and Occlusion Linemod datasets, we found out that their 3D models are not aligned with each other. The center of origin of Linemod objects is at the bottom of the object, whereas the center of origin of Occlusion Linemod objects is roughly at the object centroid. The orientation of x, y, and z axes are also different. These suggest that the pose labels on these two datasets are not aligned with each other. We therefore had a false sense of feeling that it must not be the norm to test a model on Occlusion Linemod if it is trained on Linemod. When we conducted the ablation study on Occlusion Linemod, we also observed that the pose quality of using keypoints alone roughly matches what is reported in the PVNet paper. This further consolidated our impression that these two datasets are not supposed to be mingled together. 

The effectiveness of hybrid representations can also be demonstrated by our results on Linemod, as well as the ablation studies. The performance of HybridPose on Linemod beats PVNet, which is only based on keypoints. The ablation studies further reveal how adding edge vectors and symmetry correspondences helps pose regression.

We realize that the convention in this community is to mix Linemod dataset with synthetic training examples, and test the resulting model on Occlusion Linemod. We are making a plan to re-do the experiments on Occlusion Linemod. The same rendering script developed by the authors of PVNet will be used to generate synthetic training data. We will update the results on arXiv as well as GitHub after the new experiments.

Once again, thank you for pointing out the problem. Please stay tuned.

Sincerely, Chen

sh8 commented 4 years ago

Thank you for the quick and detailed answer.

A similar confusion between Linemod and Occlusion Linemod datasets happened to me before. I hope the new experiments on Occlusion Linemod dataset still work well.

BTW, hyperparameter tuning on the validation dataset and evaluating it on the same dataset is kind of bad practice. If you have time, could you please consider re-doing experiments without optimized hyperparameters on the validation dataset?

Best, Shun

chensong1995 commented 4 years ago

We do not evaluate the performance of HybridPose on the validation set, even though the hyper-parameters in the pose regressor are tuned on the validation set.

We divide the entire dataset into three splits: the training set, the validation set, and the testing set. We train the prediction networks on the training dataset, tune the hyper-parameters of the pose regressor on the validation set, and evaluate the performance of HybridPose on the testing set. These three splits are disjoint subsets of the whole dataset. They do not overlap each other. Please refer to Section 4.1 in our paper or code around this line for details.

The reason for doing so is explained in Section 3.4 of the paper. This strategy is a common pratice in ML.

hiyyg commented 4 years ago

@chensong1995 Could you please provide the results of the whole LINEMOD OCCLUSION by training on LINEMOD (maybe+synthetic data) like other works instead of training on 80% of LINEMOD OCCLUSION and test on 20% of it, in order to make a fair comparison?

In my experience, if I train on 80% LINEMOD OCCLUSION, even using a simple regression network, I can get a fairly good accuracy (better than existing papers), but that is not meaningful since it overfits to the target scene severely :)

chensong1995 commented 4 years ago

I'll update the results once we finish the new experiments. Please stay tuned.

sh8 commented 4 years ago

@chensong1995 For confirmation, according to the codes around this line, the test dataset (20% of the dataset) is split into validation and test dataset. Is this understanding correct?

Additional Comment

Sorry that I could not find this line in your paper, but I still think it's better to evaluate the result without splitting the dataset for fair comparison... (I agree with @hiyyg in this point)

On each dataset, we randomly select 80% of the examples for training, 20 instances for validation, and the rest for test.
chensong1995 commented 4 years ago

Please stay tuned for our updates.

sh8 commented 4 years ago

Hi, @chensong1995! I did experiments on some categories with a fixed dataset, could I check my results with yours?

chensong1995 commented 4 years ago

Hi Shun,

In the past couple of days, I trained the model using both the standard Linemod training split and synthetic training examples generated by this render. Testing performance is reasonable on Linemod, but not on Occlusion Linemod yet. In almost all failure cases, the model is unable to detect the existence of the object (the segmentation mask is empty).

I just realized some baseline models are highly aggressive in online data augmentation (random rotation, cropping, etc.). Some of the online augmentation methods are designed specifically to introduce occlusion. They are applied around the object of interest, instead of on the entire image. I did not use any of these during the training process, which is probably the reason why my model has an unsatisfactory performance now. Adding online data augmentation will be my next step to debugging this issue.

And yes, I will be very glad to hear about your experiments.

Best, Chen

jakobamb commented 4 years ago

Any updates so far? Are you still Working on this? I would really appreciate seeing how the model performs on OccLinemod

Best, Jakob

chensong1995 commented 4 years ago

Hi Jakob,

Thanks for your interest in our work. I am still actively working on re-evaluating this issue. Meanwhile, I just prepended a notice to the README of this repository, which reads:

Important notice: We were informed by some readers that the training/testing split used in our experiment is inconsistent with baseline models. The comparison to baseline methods is therefore controversial. I am actively working on re-evaluating our approach. The updates will be posted to both this GitHub directory. At this point, I kindly request readers to focus on the general architecture of our method, and the relative strength of hybrid representations as demonstrated by the ablation study. I apologize for the inconvenience this may have caused.

Best, Chen

chensong1995 commented 4 years ago

Hi all,

We have updated our experiments using the conventional data split on Linemod/Occlusion Linemod. Following baseline works, we use around 15% of Linemod examples for training. The rest of Linemod examples, as well as the entire Occlusion Linemod dataset, are used for testing. Both this GitHub repository and the arXiv paper are updated. HybridPose achieves an ADD(-S) score of 0.9125577238 on Linemod, and 0.4754330537 on Occlusion Linemod. We sincerely appreciate the readers who pointed out this issue to us, including but not limited to Shun Iwase and hiyyg.

Kewenjing1020 commented 3 years ago

Hi. I'm curious about the training on OcclusionLinmod. When the target dataset is Occlusion, are you still training specific models on each object class?(as you need to specify the object name in flags). When doing the evaluation, are you evaluating the accuracy of each object class separately then calculate a weighted average over all the classes?

chensong1995 commented 3 years ago

Thanks for your questions, Kewenjing1020. Yes, we train a different set of weights for each object category (following our baseline approach PVNet). The overall metric is reported as a weighed average over all classes. I hope this clarifies!