Open JasonLiu324 opened 1 month ago
And the wierd thing is that the reward reflection during the running process is almost the same: Iteration 0: User Content: We trained a RL policy using the provided reward function code and tracked the values of the individual components in the reward function as well as global policy metrics such as success rates and episode lengths after every 300 epochs and the maximum, mean, minimum values encountered: distance_reward: ['0.79', '0.95', '0.91', '0.90', '0.90', '0.80', '0.86', '0.92', '0.92', '0.88'], Max: 0.98, Mean: 0.89, Min: 0.76 door_open_reward: ['0.00', '0.08', '0.18', '0.28', '0.29', '0.18', '0.15', '0.00', '0.16', '0.00'], Max: 0.32, Mean: 0.13, Min: 0.00 task_score: ['0.00', '0.00', '0.00', '0.02', '0.01', '0.03', '0.01', '0.00', '0.02', '0.00'], Max: 0.11, Mean: 0.01, Min: 0.00 episode_lengths: ['499.00', '359.18', '500.00', '495.78', '496.36', '493.24', '492.34', '499.69', '500.00', '500.00'], Max: 500.00, Mean: 490.73, Min: 230.97
Iteration 1: User Content: We trained a RL policy using the provided reward function code and tracked the values of the individual components in the reward function as well as global policy metrics such as success rates and episode lengths after every 300 epochs and the maximum, mean, minimum values encountered: distance_reward: ['0.79', '0.95', '0.91', '0.90', '0.90', '0.80', '0.86', '0.92', '0.92', '0.88'], Max: 0.98, Mean: 0.89, Min: 0.76 door_open_reward: ['0.00', '0.08', '0.18', '0.28', '0.29', '0.18', '0.15', '0.00', '0.16', '0.00'], Max: 0.32, Mean: 0.13, Min: 0.00 task_score: ['0.00', '0.00', '0.00', '0.02', '0.01', '0.03', '0.01', '0.00', '0.02', '0.00'], Max: 0.11, Mean: 0.01, Min: 0.00 episode_lengths: ['499.00', '359.18', '500.00', '495.78', '496.36', '493.24', '492.34', '499.69', '500.00', '500.00'], Max: 500.00, Mean: 490.73, Min: 230.97
Iteration 2: User Content: We trained a RL policy using the provided reward function code and tracked the values of the individual components in the reward function as well as global policy metrics such as success rates and episode lengths after every 300 epochs and the maximum, mean, minimum values encountered: distance_reward: ['0.79', '0.95', '0.91', '0.90', '0.90', '0.80', '0.86', '0.92', '0.92', '0.88'], Max: 0.98, Mean: 0.89, Min: 0.76 door_open_reward: ['0.00', '0.08', '0.18', '0.28', '0.29', '0.18', '0.15', '0.00', '0.16', '0.00'], Max: 0.32, Mean: 0.13, Min: 0.00 task_score: ['0.00', '0.00', '0.00', '0.02', '0.01', '0.03', '0.01', '0.00', '0.02', '0.00'], Max: 0.11, Mean: 0.01, Min: 0.00 episode_lengths: ['499.00', '359.18', '500.00', '495.78', '496.36', '493.24', '492.34', '499.69', '500.00', '500.00'], Max: 500.00, Mean: 490.73, Min: 230.97
Iteration 3: User Content: We trained a RL policy using the provided reward function code and tracked the values of the individual components in the reward function as well as global policy metrics such as success rates and episode lengths after every 300 epochs and the maximum, mean, minimum values encountered: distance_reward: ['0.79', '0.95', '0.91', '0.90', '0.90', '0.80', '0.86', '0.92', '0.92', '0.88'], Max: 0.98, Mean: 0.89, Min: 0.76 door_open_reward: ['0.00', '0.08', '0.18', '0.28', '0.29', '0.18', '0.15', '0.00', '0.16', '0.00'], Max: 0.32, Mean: 0.13, Min: 0.00 task_score: ['0.00', '0.00', '0.00', '0.02', '0.01', '0.03', '0.01', '0.00', '0.02', '0.00'], Max: 0.11, Mean: 0.01, Min: 0.00 episode_lengths: ['499.00', '359.18', '500.00', '495.78', '496.36', '493.24', '492.34', '499.69', '500.00', '500.00'], Max: 500.00, Mean: 490.73, Min: 230.97
The values are totally the same. I think there must be something wrong with the training process.
Atom light value from 0 to 1
Hi, I have successfully run the whole project and tested on several gym tasks, like FrankaCabinet and Humanoid. But the experiment result is not so good as I expected. What may be the reason?
My workstation environment is: Ubuntu 22.04 12GB RTX 4080 GPU 16GB CPU
And the command lines I have used are: python eureka.py env=FrankaCabinet sample=5 iteration=5 model_name=gpt-4 python eureka.py env=Anymal sample=5 iteration=5 model_name=gpt-4
The final success rate is only approximately 0.1. Does it related to the number of samples? My workstation can only run 5 samples in parallel due to the limit of GPU memory.