How to improve the GA-DDPG model performance on the "handover_sim" testing environment

I would like to know why the GA-DDPG model trained using the code from the GitHub (https://github.com/liruiw/GA-DDPG) achieves an 87% success rate on the YCB database. However, when I test the same trained model in the "handover_sim" environment, the success rate is only 6.25%, which is significantly worse than the results reported in the paper. I'm wondering if there could be a misalignment between the environment or settings in the GA-DDPG source code and the "handover_sim" testing environment. Is it possible that some additional adjustments to the environment settings are needed to improve the model's performance in the "handover_sim" environment?

The first result below is from my own retrained GA-DDPG model, and the second result is from loading the trained model and testing it in the "handover_sim" environment.

Training code following the GitHub (https://github.com/liruiw/GA-DDPG) setting: python -m core.train_online --save_model --config_file td3_critic_aux_policy_aux.yaml --policy DDPG --log --fix_output_time ddpg_model_233_1000000_GADDPG --seed 233

Testing code following the GitHub setting: Testing on YCB objects bash ./experiments/scripts/test_ycb.sh demo_model

Test Time: 08_08_2023_11:17:17 Data Root: data/scenes/data_5w.npz Model: demo_model Script: td3_critic_aux_policy_aux.yaml Index: ycb_large.json Num of Objs: 9 Num of Runs: 3 Policy: DDPG Model Path: output/demo_model Step: 300000 Test Episodes: 270.0 Avg. Length: 25.815 Index: scene_0-scene_164 Avg. Performance: (Return: 0.870 +- 0.02778) (Success: 0.870 +- 0.02778) +---------------------+---------+-----------+ | object name | count | success | |---------------------+---------+-----------| | 003_cracker_box | 30 | 26 | | 004_sugar_box | 30 | 23 | | 005_tomato_soup_can | 30 | 28 | | 006_mustard_bottle | 30 | 30 | | 010_potted_meat_can | 30 | 23 | | 021_bleach_cleanser | 30 | 26 | | 024_bowl | 30 | 30 | | 025_mug | 30 | 23 | | 061_foam_brick | 30 | 26 | +---------------------+---------+-----------+

run for "GA-DDPG hold" on the test split of s0 with:

GADDPG_DIR=GA-DDPG CUDA_VISIBLE_DEVICES=0 python examples/run_benchmark_gaddpg_hold.py \ SIM.RENDER True \ ENV.ID HandoverHandCameraPointStateEnv-v1 \ BENCHMARK.SETUP s0

pybullet build time: May 20 2022 19:44:17 2023-08-07 15:30:16: Running evaluation for results/2023-08-07_13-51-23_ga-ddpg-hold_s0_test 2023-08-07 15:30:16: Evaluation results:	success rate	mean accum time (s)	failure (%)		(%)	exec	plan	total	hand contact	object drop	timeout
6.25 ( 9/144)	7.390	0.261	7.651	0.69 ( 1/144)	13.19 ( 19/144)	79.86 (115/144)

2023-08-07 15:30:16: Printing scene ids 2023-08-07 15:30:16: Success (9 scenes):

5 8 16 20 25 30 37 55 109

2023-08-07 15:30:16: Failure - hand contact (1 scenes):

11

2023-08-07 15:30:16: Failure - object drop (19 scenes):

9 12 21 23 24 27 33 36 45 54 60 66 67 68 108 117 121 135 136

2023-08-07 15:30:17: Failure - timeout (115 scenes):

0 1 2 3 4 6 7 10 13 14 15 17 18 19 22 26 28 29 31 32 34 35 38 39 40 41 42 43 44 46 47 48 49 50 51 52 53 56 57 58 59 61 62 63 64 65 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 110 111 112 113 114 115 116 118 119 120 122 123 124 125 126 127 128 129 130 131 132 133 134 137 138 139 140 141 142 143

2023-08-07 15:30:17: Evaluation complete.

NVlabs / handover-sim

How to improve the GA-DDPG model performance on the "handover_sim" testing environment #13

2023-08-07 15:30:16: Failure - hand contact (1 scenes):

11