Question about rlbench depth data

kjeiun commented 3 months ago

I found that the generated depth data from gen_demonstration was quite different from other depth data. Do you think it is a intended result?

cheng052 commented 3 months ago

This seems ok. As the depth data is saved as a single channel image, such visualization results seem to be display issues. The converted point cloud seems good in Meshlab.

Btw have you ever reproduced the results reported in the paper? It seems hard to reproduce the performance shown in the paper.

kjeiun commented 3 months ago

Thanks for the reply !

I haven't tried the evaluation since there was a problem when trying to run evaluation code inside of the docker , but the reconstructed image shown in wandb wasn't good.

Are you meaning the success rate?

cheng052 commented 3 months ago

Yes, the success rate. Here are some results I got (The last 4 rows are reproduced on my local machine, all experiments are done using 2 RTX3090 GPUs).	Method	lr_scheduler	geo	dyna	sem	step	comment
GNFactor	FALSE	1	0	1	100000	paper report	31.7
ManiGaussian	TRUE	1	0	0	100000	paper report	39.2
ManiGaussian	TRUE	1	1	0	100000	paper report	41.6
ManiGaussian	TRUE	1	1	1	100000	paper report	44.8

GNFactor	FALSE	1	0	1	100000	released ckpt	38.4
GNFactor	FALSE	1	0	1	100000	released csv	36.13
ManiGaussian	Unknown	1	0	0	100000	released csv	41.07

GNFactor	FALSE	1	0	1	100000	Local Reproduction	38
ManiGaussian	FALSE	1	0	0	100000	Local Reproduction	32.8
ManiGaussian	FALSE	1	1	0	100000	Local Reproduction	34
ManiGaussian	TRUE	1	1	1	100000	Local Reproduction	29.6

The GNFactor performance can match the paper, but ManiGaussian fails. As seen in the table and the reconstructed image, I think there are still some hidden bugs in the released code. @GuanxingLu Any suggestion on reproducing the desired performance?

GuanxingLu commented 3 months ago

Sorry for the late reply. The reconstruction results seem normal, as the action loss plays a main role in the optimization, the reconstruction should seem relatively poor. The reconstruction quality does not affect the action prediction because we decode the robot action from the volumetric representation rather than the Gaussians (in the test phase, the Gaussian regressor and deformation field are not used).

However, though the training and evaluation processes still fluctuate even with the seed fixed, the provided scripts should reproduce the results without problem... thanks for your detailed experimental logs, I think there are several things to try: 1. evaluate the 'best' checkpoint rather than 'last' (maybe 90000 steps), sometimes the performance of the 'last' checkpoint drops slightly. 2. just evaluate the checkpoint again.

GuanxingLu commented 3 months ago

This seems ok. As the depth data is saved as a single channel image, such visualization results seem to be display issues. The converted point cloud seems good in Meshlab.

Btw have you ever reproduced the results reported in the paper? It seems hard to reproduce the performance shown in the paper.

Thanks for your answer. Yes the depth image is quantized by rlbench, so the visualization may seem weird. See https://github.com/GuanxingLu/ManiGaussian/blob/main/third_party/RLBench/rlbench/backend/utils.py for more details.

pyun-ram commented 2 weeks ago

Actually, I encountered a same problem. Even with the best ckpt evaluation strategy, the average success rate of ManiGaussian (trained by scripts/train_and_eval_w_geo_sem_dyna.sh) is only 35.6 less than the 44.8. @GuanxingLu We would appreciate it, if you can provide some suggestions. Thanks!

I thinks my test data is correct, since evaluating the officially released ckpt on this test data provides 44.8 successful rate.

GuanxingLu commented 2 weeks ago

Hello, thanks for your interest. Can you evalute the checkpoints of 90k and 100k again to see if there is any change of the performance? As you could find in the attached csv in the released checkpoint, the performance fluctuates between 39.20 and 44.80 even with the same checkpoint and same random seed. Besides, I'm planning to train with the provided script in a new server, stay tuned.

pyun-ram commented 2 weeks ago

Thanks for your reply. After re-evaluating the 90k and 100k ckpt, the successful rate goes to 34.0 and 33.6 as shown in the last two lines, which far from the 39.2 or 44.8 as reported in the paper... :(

GuanxingLu / ManiGaussian

Question about rlbench depth data #19