GuanxingLu / ManiGaussian

[ECCV 2024] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
MIT License
180 stars 7 forks source link

Question about rlbench depth data #19

Closed kjeiun closed 2 months ago

kjeiun commented 3 months ago
image

I found that the generated depth data from gen_demonstration was quite different from other depth data. Do you think it is a intended result?

cheng052 commented 3 months ago

This seems ok. As the depth data is saved as a single channel image, such visualization results seem to be display issues. The converted point cloud seems good in Meshlab.

image

Btw have you ever reproduced the results reported in the paper? It seems hard to reproduce the performance shown in the paper.

kjeiun commented 3 months ago

Thanks for the reply !

I haven't tried the evaluation since there was a problem when trying to run evaluation code inside of the docker , but the reconstructed image shown in wandb wasn't good.

Are you meaning the success rate?

cheng052 commented 3 months ago
Yes, the success rate. Here are some results I got (The last 4 rows are reproduced on my local machine, all experiments are done using 2 RTX3090 GPUs). Method lr_scheduler geo dyna sem step comment AVG extra
GNFactor FALSE 1 0 1 100000 paper report 31.7
ManiGaussian TRUE 1 0 0 100000 paper report 39.2
ManiGaussian TRUE 1 1 0 100000 paper report 41.6
ManiGaussian TRUE 1 1 1 100000 paper report 44.8
GNFactor FALSE 1 0 1 100000 released ckpt 38.4
GNFactor FALSE 1 0 1 100000 released csv 36.13
ManiGaussian Unknown 1 0 0 100000 released csv 41.07
GNFactor FALSE 1 0 1 100000 Local Reproduction 38
ManiGaussian FALSE 1 0 0 100000 Local Reproduction 32.8 media_images_eval_recon_img_100000_3633c27ad4ea2549c5c1
ManiGaussian FALSE 1 1 0 100000 Local Reproduction 34 mg_geo_dyna
ManiGaussian TRUE 1 1 1 100000 Local Reproduction 29.6 mg_geo_dyna_sem

The GNFactor performance can match the paper, but ManiGaussian fails. As seen in the table and the reconstructed image, I think there are still some hidden bugs in the released code. @GuanxingLu Any suggestion on reproducing the desired performance?

GuanxingLu commented 3 months ago

Sorry for the late reply. The reconstruction results seem normal, as the action loss plays a main role in the optimization, the reconstruction should seem relatively poor. The reconstruction quality does not affect the action prediction because we decode the robot action from the volumetric representation rather than the Gaussians (in the test phase, the Gaussian regressor and deformation field are not used).

However, though the training and evaluation processes still fluctuate even with the seed fixed, the provided scripts should reproduce the results without problem... thanks for your detailed experimental logs, I think there are several things to try: 1. evaluate the 'best' checkpoint rather than 'last' (maybe 90000 steps), sometimes the performance of the 'last' checkpoint drops slightly. 2. just evaluate the checkpoint again.

GuanxingLu commented 3 months ago

This seems ok. As the depth data is saved as a single channel image, such visualization results seem to be display issues. The converted point cloud seems good in Meshlab. image

Btw have you ever reproduced the results reported in the paper? It seems hard to reproduce the performance shown in the paper.

Thanks for your answer. Yes the depth image is quantized by rlbench, so the visualization may seem weird. See https://github.com/GuanxingLu/ManiGaussian/blob/main/third_party/RLBench/rlbench/backend/utils.py for more details.

pyun-ram commented 2 weeks ago

Actually, I encountered a same problem. Even with the best ckpt evaluation strategy, the average success rate of ManiGaussian (trained by scripts/train_and_eval_w_geo_sem_dyna.sh) is only 35.6 less than the 44.8. @GuanxingLu We would appreciate it, if you can provide some suggestions. Thanks!

I thinks my test data is correct, since evaluating the officially released ckpt on this test data provides 44.8 successful rate.

screenshot
GuanxingLu commented 2 weeks ago

Hello, thanks for your interest. Can you evalute the checkpoints of 90k and 100k again to see if there is any change of the performance? As you could find in the attached csv in the released checkpoint, the performance fluctuates between 39.20 and 44.80 even with the same checkpoint and same random seed. Besides, I'm planning to train with the provided script in a new server, stay tuned.

pyun-ram commented 2 weeks ago

Thanks for your reply. After re-evaluating the 90k and 100k ckpt, the successful rate goes to 34.0 and 33.6 as shown in the last two lines, which far from the 39.2 or 44.8 as reported in the paper... :(

screenshot_1108