In the environment source codes, the function compute_success provides the human-engineered reward function.
And they are added to the system message. So the reward generation is not zero-shot anymore, because ChatGPT4 can read the human reward function and optimize on top of it.
In the environment source codes, the function
compute_success
provides the human-engineered reward function. And they are added to the system message. So the reward generation is not zero-shot anymore, because ChatGPT4 can read the human reward function and optimize on top of it.