eureka-research / Eureka

Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)
https://eureka-research.github.io/
MIT License
2.73k stars 244 forks source link

Getting error from the example given in the Geting Started page #4

Open ramkumarkoppu opened 8 months ago

ramkumarkoppu commented 8 months ago

When I ran python eureka.py env=shadow_hand sample=4 iteration=2 model=gpt-4-0314 I get following output:

[2023-10-21 20:06:04,835][root][INFO] - Workspace: /home/ram/Eureka/eureka/outputs/eureka/2023-10-21_20-06-04 [2023-10-21 20:06:04,835][root][INFO] - Project Root: /home/ram/Eureka/eureka [2023-10-21 20:06:04,835][root][INFO] - Using LLM: gpt-4-0314 [2023-10-21 20:06:04,835][root][INFO] - Task: ShadowHand [2023-10-21 20:06:04,836][root][INFO] - Task description: to make the shadow hand spin the object to a target orientation [2023-10-21 20:06:04,868][root][INFO] - Iteration 0: Generating 4 samples with gpt-4-0314 [2023-10-21 20:07:02,853][root][INFO] - Iteration 0: Prompt Tokens: 1735, Completion Tokens: 1992, Total Tokens: 3727 [2023-10-21 20:07:02,854][root][INFO] - Iteration 0: Processing Code Run 0 [2023-10-21 20:07:22,595][root][INFO] - Iteration 0: Code Run 0 execution error! [2023-10-21 20:07:22,595][root][INFO] - Iteration 0: Processing Code Run 1 [2023-10-21 20:07:42,063][root][INFO] - Iteration 0: Code Run 1 execution error! [2023-10-21 20:07:42,063][root][INFO] - Iteration 0: Processing Code Run 2 [2023-10-21 20:08:01,523][root][INFO] - Iteration 0: Code Run 2 execution error! [2023-10-21 20:08:01,524][root][INFO] - Iteration 0: Processing Code Run 3 [2023-10-21 20:08:20,803][root][INFO] - Iteration 0: Code Run 3 execution error! [2023-10-21 20:08:22,234][root][INFO] - All code generation failed! Repeat this iteration from the current message checkpoint! [2023-10-21 20:08:22,235][root][INFO] - Iteration 1: Generating 4 samples with gpt-4-0314 [2023-10-21 20:08:48,630][root][INFO] - Iteration 1: Prompt Tokens: 1735, Completion Tokens: 1432, Total Tokens: 3167 [2023-10-21 20:08:48,631][root][INFO] - Iteration 1: Processing Code Run 0 [2023-10-21 20:09:05,350][root][INFO] - Iteration 1: Code Run 0 execution error! [2023-10-21 20:09:05,350][root][INFO] - Iteration 1: Processing Code Run 1 [2023-10-21 20:09:25,084][root][INFO] - Iteration 1: Code Run 1 execution error! [2023-10-21 20:09:25,084][root][INFO] - Iteration 1: Processing Code Run 2 [2023-10-21 20:09:44,618][root][INFO] - Iteration 1: Code Run 2 execution error! [2023-10-21 20:09:44,618][root][INFO] - Iteration 1: Processing Code Run 3 [2023-10-21 20:10:03,932][root][INFO] - Iteration 1: Code Run 3 execution error! [2023-10-21 20:10:05,409][root][INFO] - All code generation failed! Repeat this iteration from the current message checkpoint! [2023-10-21 20:10:05,409][root][INFO] - All iterations of code generation failed, aborting... [2023-10-21 20:10:05,409][root][INFO] - Please double check the output env_iter_response.txt files for repeating errors!

JasonMa2016 commented 8 months ago

Thank you for your interest in our work! What you are observing is the case when none of the Eureka generations is executable in the first iteration. In our paper, we use 16 samples per iteration and we find that produces at least one executable reward program in all our tasks.

We are happy to accept a PR that implements a feature where the sampling continues until at least one executable reward program is generated.

jgforbes commented 8 months ago

I am getting similar messages from the gtp-3.5-turbo-16k-0613 model. Code Run 12 execution error! Code Run 13 cannot parse function signature! I increased runs to 24 and iterations to 10. No successful executions of the returned code.

When installing the Eureka code this command gave some errors. Is rl_games required for the demonstrations? cd ../rl_games; pip install -e .

It would be nice to reproduce your work.........

JasonMa2016 commented 8 months ago

Here is my running the code. Eureka with gpt-4-0314 can generate more than half executable rewards in the first iteration to seed the evolution.

[2023-10-24 00:33:13,735][root][INFO] - Workspace: /home/exx/Projects/eureka-private/eureka/outputs/eureka/2023-10-24_00-33-13
[2023-10-24 00:33:13,735][root][INFO] - Project Root: /home/exx/Projects/eureka-private/eureka
[2023-10-24 00:33:13,735][root][INFO] - Using LLM: gpt-4-0314
[2023-10-24 00:33:13,735][root][INFO] - Task: ShadowHand
[2023-10-24 00:33:13,735][root][INFO] - Task description: to make the shadow hand spin the object to a target orientation
[2023-10-24 00:33:13,758][root][INFO] - Iteration 0: Generating 16 samples with gpt-4-0314
[2023-10-24 00:35:31,723][root][INFO] - Iteration 0: Prompt Tokens: 1735, Completion Tokens: 5685, Total Tokens: 12625
[2023-10-24 00:35:31,724][root][INFO] - Iteration 0: Processing Code Run 0
[2023-10-24 00:35:51,353][root][INFO] - Iteration 0: Code Run 0 execution error!
[2023-10-24 00:35:51,354][root][INFO] - Iteration 0: Processing Code Run 1
[2023-10-24 00:36:01,260][root][INFO] - Iteration 0: Code Run 1 successfully training!
[2023-10-24 00:36:01,260][root][INFO] - Iteration 0: Processing Code Run 2
[2023-10-24 00:36:11,567][root][INFO] - Iteration 0: Code Run 2 execution error!
[2023-10-24 00:36:11,568][root][INFO] - Iteration 0: Processing Code Run 3
[2023-10-24 00:36:21,733][root][INFO] - Iteration 0: Code Run 3 execution error!
[2023-10-24 00:36:21,733][root][INFO] - Iteration 0: Processing Code Run 4
[2023-10-24 00:36:33,710][root][INFO] - Iteration 0: Code Run 4 successfully training!
[2023-10-24 00:36:33,710][root][INFO] - Iteration 0: Processing Code Run 5
[2023-10-24 00:36:45,957][root][INFO] - Iteration 0: Code Run 5 successfully training!
[2023-10-24 00:36:45,957][root][INFO] - Iteration 0: Processing Code Run 6
[2023-10-24 00:36:57,649][root][INFO] - Iteration 0: Code Run 6 execution error!
[2023-10-24 00:36:57,649][root][INFO] - Iteration 0: Processing Code Run 7
[2023-10-24 00:37:10,399][root][INFO] - Iteration 0: Code Run 7 successfully training!
[2023-10-24 00:37:10,399][root][INFO] - Iteration 0: Processing Code Run 8
[2023-10-24 00:37:24,650][root][INFO] - Iteration 0: Code Run 8 successfully training!
[2023-10-24 00:37:24,650][root][INFO] - Iteration 0: Processing Code Run 9
[2023-10-24 00:37:37,578][root][INFO] - Iteration 0: Code Run 9 execution error!
[2023-10-24 00:37:37,578][root][INFO] - Iteration 0: Processing Code Run 10
[2023-10-24 00:37:49,788][root][INFO] - Iteration 0: Code Run 10 execution error!
[2023-10-24 00:37:49,788][root][INFO] - Iteration 0: Processing Code Run 11
[2023-10-24 00:38:03,219][root][INFO] - Iteration 0: Code Run 11 execution error!
[2023-10-24 00:38:03,219][root][INFO] - Iteration 0: Processing Code Run 12
[2023-10-24 00:38:18,007][root][INFO] - Iteration 0: Code Run 12 successfully training!
[2023-10-24 00:38:18,007][root][INFO] - Iteration 0: Processing Code Run 13
[2023-10-24 00:38:34,541][root][INFO] - Iteration 0: Code Run 13 successfully training!
[2023-10-24 00:38:34,541][root][INFO] - Iteration 0: Processing Code Run 14
[2023-10-24 00:38:52,689][root][INFO] - Iteration 0: Code Run 14 successfully training!
[2023-10-24 00:38:52,689][root][INFO] - Iteration 0: Processing Code Run 15
[2023-10-24 00:39:08,733][root][INFO] - Iteration 0: Code Run 15 successfully training!

Now, this is Eureka using gpt-3.5-turbo-16k-0613:

[2023-10-24 00:40:14,778][root][INFO] - Workspace: /home/exx/Projects/eureka-private/eureka/outputs/eureka/2023-10-24_00-40-14
[2023-10-24 00:40:14,778][root][INFO] - Project Root: /home/exx/Projects/eureka-private/eureka
[2023-10-24 00:40:14,778][root][INFO] - Using LLM: gpt-3.5-turbo-16k-0613
[2023-10-24 00:40:14,778][root][INFO] - Task: ShadowHand
[2023-10-24 00:40:14,778][root][INFO] - Task description: to make the shadow hand spin the object to a target orientation
[2023-10-24 00:40:14,800][root][INFO] - Iteration 0: Generating 16 samples with gpt-3.5-turbo-16k-0613
[2023-10-24 00:40:50,280][root][INFO] - Iteration 0: Prompt Tokens: 1735, Completion Tokens: 4594, Total Tokens: 6329
[2023-10-24 00:40:50,280][root][INFO] - Iteration 0: Processing Code Run 0
[2023-10-24 00:40:52,922][root][INFO] - Iteration 0: Code Run 0 execution error!
[2023-10-24 00:40:52,922][root][INFO] - Iteration 0: Processing Code Run 1
[2023-10-24 00:41:03,572][root][INFO] - Iteration 0: Code Run 1 successfully training!
[2023-10-24 00:41:03,572][root][INFO] - Iteration 0: Processing Code Run 2
[2023-10-24 00:41:15,421][root][INFO] - Iteration 0: Code Run 2 successfully training!
[2023-10-24 00:41:15,421][root][INFO] - Iteration 0: Processing Code Run 3
[2023-10-24 00:41:15,421][root][INFO] - Iteration 0: Code Run 3 cannot parse function signature!
[2023-10-24 00:41:15,421][root][INFO] - Iteration 0: Processing Code Run 4
[2023-10-24 00:41:26,643][root][INFO] - Iteration 0: Code Run 4 execution error!
[2023-10-24 00:41:26,643][root][INFO] - Iteration 0: Processing Code Run 5
[2023-10-24 00:41:37,264][root][INFO] - Iteration 0: Code Run 5 execution error!
[2023-10-24 00:41:37,264][root][INFO] - Iteration 0: Processing Code Run 6
[2023-10-24 00:41:50,919][root][INFO] - Iteration 0: Code Run 6 successfully training!
[2023-10-24 00:41:50,919][root][INFO] - Iteration 0: Processing Code Run 7
[2023-10-24 00:42:01,638][root][INFO] - Iteration 0: Code Run 7 execution error!
[2023-10-24 00:42:01,638][root][INFO] - Iteration 0: Processing Code Run 8
[2023-10-24 00:42:17,147][root][INFO] - Iteration 0: Code Run 8 successfully training!
[2023-10-24 00:42:17,147][root][INFO] - Iteration 0: Processing Code Run 9
[2023-10-24 00:42:29,767][root][INFO] - Iteration 0: Code Run 9 successfully training!
[2023-10-24 00:42:29,767][root][INFO] - Iteration 0: Processing Code Run 10
[2023-10-24 00:42:32,455][root][INFO] - Iteration 0: Code Run 10 execution error!
[2023-10-24 00:42:32,455][root][INFO] - Iteration 0: Processing Code Run 11
[2023-10-24 00:42:46,817][root][INFO] - Iteration 0: Code Run 11 successfully training!
[2023-10-24 00:42:46,817][root][INFO] - Iteration 0: Processing Code Run 12
[2023-10-24 00:43:04,635][root][INFO] - Iteration 0: Code Run 12 successfully training!
[2023-10-24 00:43:04,635][root][INFO] - Iteration 0: Processing Code Run 13
[2023-10-24 00:43:21,069][root][INFO] - Iteration 0: Code Run 13 successfully training!
[2023-10-24 00:43:21,069][root][INFO] - Iteration 0: Processing Code Run 14
[2023-10-24 00:43:41,452][root][INFO] - Iteration 0: Code Run 14 successfully training!
[2023-10-24 00:43:41,453][root][INFO] - Iteration 0: Processing Code Run 15
[2023-10-24 00:43:51,951][root][INFO] - Iteration 0: Code Run 15 execution error!

Given that API calls are stochastic, it is possible that the entire iteration does not have a single executable reward. In this case, I suggest just re-running the code again. A more long term solution would be to keep sampling until an executable reward is returned; we welcome a PR that implements this feature!

jgforbes commented 8 months ago

OK, I am missing the rl_games.common module.

jgforbes commented 8 months ago

This error when running: pip install -e . in rl_gamesRequirement already satisfied: distlib<1,>=0.3.7 in /home/gymuser/.local/lib/python3.8/site-packages (from virtualenv->ray<2.0.0,>=1.11.0->rl_games==1.6.1) (0.3.7) Installing collected packages: rl-games Running setup.py develop for rl-games ERROR: Command errored out with exit status 1: command: /opt/conda/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/gymuser/Eureka/rl_games/setup.py'"'"'; file='"'"'/home/gymuser/Eureka/rl_games/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps --user --prefix= cwd: /home/gymuser/Eureka/rl_games/ Complete output (3 lines): Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'setuptools'

ERROR: Command errored out with exit status 1: /opt/conda/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/gymuser/Eureka/rl_games/setup.py'"'"'; file='"'"'/home/gymuser/Eureka/rl_games/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps --user --prefix= Check the logs for full command output.

jgforbes commented 8 months ago

need setuptools in pyproject.toml https://github.com/Denys88/rl_games/pull/190/files#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711