reproducing results with corrected dataset

mrudorfer commented 3 years ago

Hello,

Following up on my previous question regarding the offset in the dataset, we unfortunately still have problems reproducing the results in the paper, and we were wondering whether you have any further advice. We compared the available pretrained model with models we trained using the code in this repository and the provided dataset. For the training dataset we used two conditions: 1) "offset" which is the dataset as we downloaded it, and 2) "fixed", where we removed the offsets from grasp centers and contact points.

Similarly, for the testing we used the same two conditions: 1) "offset", where we compared with the test annotations as downloaded and performed simulations without any adjustments of the predicted grasps, and 2) "fixed", where we used the corrected dataset for rule-based evaluation and we introduced the offset to the predicted grasps for simulation.

Our expectation was that using the code from the GPNet repository with fixed training dataset and fixed evaluation should be the correct way to reproduce the results, however, we also checked other combinations and here are all results:

gpnet_results_reproduce

As you can see, the version of GPNet that we trained with the fixed dataset performs very poorly in both evaluation conditions, which is surprising to us. We have very thoroughly checked that we fixed the dataset in the correct way and we are certain that grasp centers and contact points are correct now.

The pre-trained model and the one we trained in "offset" condition work ok and produce very similar results. Although the rule-based success rates are more or less matched, neither of those models is able to reproduce the high simulation success rates reported in the paper. We think that the simulation-based score is the more important one, as it produces a label for the actually predicted grasps instead of simply looking up nearby grasps.

So to summarise, our questions are: 1) Can you confirm that fixed-fixed condition should be the correct one? Is this also the one you used for producing the paper results and the pre-trained model? 2) Is there any possible explanation why the model we trained with the fixed dataset performs so poorly? We just used the repository code with corrected dataset and are a bit clueless on this one... 3) Should we expect to be able to reliably reproduce the high simulation success rates reported in the paper? Are we doing something wrong, as we do not reproduce these scores even with the pre-trained model? Or would that score maybe be rather some sort of upper bound on the achievable simulation success rate (as we experience that variance is relatively high)?

We would very much appreciate your thoughts on this. Thank you in advance!

Best wishes, Martin

CZ-Wu commented 2 years ago

Sorry for my late reply.

I have verified that the "fixed" version will produce better results than that of "offset" version on a much bigger dataset, which is not released yet. I think you could fix the problem as follows:

In the "fixed" dataset, we remove the offset along the 'z' axis of the gripper, so we should add the movement to the predicted centers. eg. in this line test.py L188 add the zMove function and zMoveLength set to 0.015.
Do you test the epoch 500 only? You could try the trained models of other epochs.

Hope these can help you! I will check the code of this repository, but it will take some days as I'm busy these days. Sorry for that.

mrudorfer commented 2 years ago

Thank you for your response.

1. In the "fixed" dataset, we remove the offset along the 'z' axis of the gripper, so we should add the movement to the predicted centers. eg. in this line [test.py L188](https://github.com/Gorilla-Lab-SCUT/GPNet/blob/6fc36caab53d6e5093e9cdf2dc3ed2b7a7485582/test.py#L188,) add the `zMove` function and `zMoveLength` set to `0.015`.
I don't think adding the offset here is a good idea. The simulation results would be correct, but the rule-based evaluation would be wrong. In my opinion, the offset should only be introduced within the simulator, because that is the only place where it is actually needed. I have "outsourced" your simulation into a separate package here, which takes an option to apply the zMove to the predictions. It can be used via both CLI and API.
2. Do you test the epoch 500 only? You could try the trained models of other epochs.
We also tested other epochs. There is some variance, but the general picture remains the same.

Hope these can help you! I will check the code of this repository, but it will take some days as I'm busy these days. Sorry for that. Thank you, it is very helpful even if it takes a bit of time. To me it is still unclear why we cannot reproduce results with the "fixed" dataset, so if there is any chance you could take a look at this or simply try to reproduce it on your side (so that we can narrow down the root cause) it would be greatly appreciated!

CZ-Wu commented 2 years ago

@mrudorfer Hi! We have updated this repository. For stable training, please switch to the Adam optimizer. We get 96.7% success rate@10% with the correct grasp centers and contacts. The correct dataset and pretrained model are also uploaded to Google Driver. For more details, please check README.md. Good Luck!

Gorilla-Lab-SCUT / GPNet

reproducing results with corrected dataset #5