Unfair comparison! Human designed reward in the source code

eureka-research / Eureka

Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)

https://eureka-research.github.io/

MIT License

2.73k stars 244 forks source link

Unfair comparison! Human designed reward in the source code #23

Closed CeHao1 closed 8 months ago

CeHao1 commented 8 months ago

In the environment source codes, the function compute_success provides the human-engineered reward function. And they are added to the system message. So the reward generation is not zero-shot anymore, because ChatGPT4 can read the human reward function and optimize on top of it.