Closed kjeiun closed 2 months ago
This seems ok. As the depth data is saved as a single channel image, such visualization results seem to be display issues. The converted point cloud seems good in Meshlab.
Btw have you ever reproduced the results reported in the paper? It seems hard to reproduce the performance shown in the paper.
Thanks for the reply !
I haven't tried the evaluation since there was a problem when trying to run evaluation code inside of the docker , but the reconstructed image shown in wandb wasn't good.
Are you meaning the success rate?
Yes, the success rate. Here are some results I got (The last 4 rows are reproduced on my local machine, all experiments are done using 2 RTX3090 GPUs). | Method | lr_scheduler | geo | dyna | sem | step | comment | AVG | extra |
---|---|---|---|---|---|---|---|---|---|
GNFactor | FALSE | 1 | 0 | 1 | 100000 | paper report | 31.7 | ||
ManiGaussian | TRUE | 1 | 0 | 0 | 100000 | paper report | 39.2 | ||
ManiGaussian | TRUE | 1 | 1 | 0 | 100000 | paper report | 41.6 | ||
ManiGaussian | TRUE | 1 | 1 | 1 | 100000 | paper report | 44.8 | ||
GNFactor | FALSE | 1 | 0 | 1 | 100000 | released ckpt | 38.4 | ||
GNFactor | FALSE | 1 | 0 | 1 | 100000 | released csv | 36.13 | ||
ManiGaussian | Unknown | 1 | 0 | 0 | 100000 | released csv | 41.07 | ||
GNFactor | FALSE | 1 | 0 | 1 | 100000 | Local Reproduction | 38 | ||
ManiGaussian | FALSE | 1 | 0 | 0 | 100000 | Local Reproduction | 32.8 | ||
ManiGaussian | FALSE | 1 | 1 | 0 | 100000 | Local Reproduction | 34 | ||
ManiGaussian | TRUE | 1 | 1 | 1 | 100000 | Local Reproduction | 29.6 |
The GNFactor performance can match the paper, but ManiGaussian fails. As seen in the table and the reconstructed image, I think there are still some hidden bugs in the released code. @GuanxingLu Any suggestion on reproducing the desired performance?
Sorry for the late reply. The reconstruction results seem normal, as the action loss plays a main role in the optimization, the reconstruction should seem relatively poor. The reconstruction quality does not affect the action prediction because we decode the robot action from the volumetric representation rather than the Gaussians (in the test phase, the Gaussian regressor and deformation field are not used).
However, though the training and evaluation processes still fluctuate even with the seed fixed, the provided scripts should reproduce the results without problem... thanks for your detailed experimental logs, I think there are several things to try: 1. evaluate the 'best' checkpoint rather than 'last' (maybe 90000 steps), sometimes the performance of the 'last' checkpoint drops slightly. 2. just evaluate the checkpoint again.
This seems ok. As the depth data is saved as a single channel image, such visualization results seem to be display issues. The converted point cloud seems good in Meshlab.
Btw have you ever reproduced the results reported in the paper? It seems hard to reproduce the performance shown in the paper.
Thanks for your answer. Yes the depth image is quantized by rlbench, so the visualization may seem weird. See https://github.com/GuanxingLu/ManiGaussian/blob/main/third_party/RLBench/rlbench/backend/utils.py for more details.
Actually, I encountered a same problem. Even with the best ckpt evaluation strategy, the average success rate of ManiGaussian (trained by scripts/train_and_eval_w_geo_sem_dyna.sh) is only 35.6 less than the 44.8. @GuanxingLu We would appreciate it, if you can provide some suggestions. Thanks!
I thinks my test data is correct, since evaluating the officially released ckpt on this test data provides 44.8 successful rate.
Hello, thanks for your interest. Can you evalute the checkpoints of 90k and 100k again to see if there is any change of the performance? As you could find in the attached csv in the released checkpoint, the performance fluctuates between 39.20 and 44.80 even with the same checkpoint and same random seed. Besides, I'm planning to train with the provided script in a new server, stay tuned.
Thanks for your reply. After re-evaluating the 90k and 100k ckpt, the successful rate goes to 34.0 and 33.6 as shown in the last two lines, which far from the 39.2 or 44.8 as reported in the paper... :(
I found that the generated depth data from gen_demonstration was quite different from other depth data. Do you think it is a intended result?