PioneerAlexander commented 6 months ago

BC running issues

Hello! I have tried to evaluate BC using all benchmarks and I have encountered the bugs and errors described below, because of which I either cannot successfully run the code at all, or I cannot get results described in the article

Maze:

For the evaluation I use the command line python -m llm_rl_scripts.maze.bc.eval_bc PARAMS my_path

During the evaluation of the “fully observed” version it seems like the GPT2PPOPolicy is used (use_reranker_for_reward_eval: bool=False). However, when I have tried to evaluate the model using this policy it acts only with Text(“\n”, is_action=True) and I get -4.0 reward for every move. Moreover, in fully_observed_bc.py another policy (ReRankerSamplePolicy) is used.

Chess:

For the train with chess full games I use the command line

python -m llm_rl_scripts.chess.bc.train_full_games_bc HF gpt2 dataset_path

After running this command I get the following warning:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...

To disable this warning, you can either:

- Avoid using `tokenizers` before the fork if possible

- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

After this warning the code crashes with

jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 1258291200 bytes.

That was after the checkpoint was created, and I tried to evaluate BC with the command:

python -m llm_rl_scripts.chess.bc.eval_full_games_bc PARAMS params_path

It crashes when evaluating the environment created with interactions, results = text_env_eval( env=env, policy=policy, n_rollouts=policy_n_rollouts, verbose=True, env_options={"init_position": position}, bsize=policy_bsize, ) with the error NameError: name 'position' is not defined

Guess My City

For the train I have not found where the 'vocab_file' is located or could be uploaded. Is it some of the files in llm_rl_scripts/wordle/vocab/? Please clarify which command should I use in order to train BC on this task.

Wordle

For the training BC on Wordle I have tried to run the commands with every vocab file:

python -m llm_rl_scripts.wordle.bc.train_bc_gpt2 HF gpt2 datasets/wordle/train_data.jsonl datasets/wordle/eval_data.jsonl llm_rl_scripts/wordle/vocab/tweet_words.txt And for any of runs I have received the following error:

ValueError: Incompatible shapes for broadcasting: (16, 1, 1, 2048) and requested shape (16, 1, 1024, 1024)

And could not come out with the correct run command for llm_rl_scripts.wordle.bc.train_bc.

Trying python -m llm_rl_scripts.wordle.bc.train_bc HF gpt2 datasets/wordle/train_data.jsonl datasets/wordle/eval_data.jsonl llm_rl_scripts/wordle/vocab/tweet_words.txt leads to the following warning and error:

You are using a model of type gpt2 to instantiate a model of type gptj. This is not supported for all configurations of models and can yield errors. 
   The checkpoint gpt2 is missing required keys: {really long dict of required keys} 
   ...
  KeyError: 'lm_head'

Car dealer

I could not run this script llm_rl_scripts/car_dealer/bc/train_bc.py due to following error:

ModuleNotFoundError: No module named 'jax_models'

It seems like other users have the same issue (Issue #2)

Text_Nav

I have tried to train BC with the following command: python -m llm_rl_scripts.text_nav.bc.train_bc HF gpt2 datasets/text_nav/train_full_info.json datasets/text_nav/eval_full_info.json

It fails with the following error:

Traceback (most recent call last): File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File ".../llm_rl_scripts/text_nav/bc/train_bc.py", line 291, in <module> tyro.cli(main) File ".../venv1/lib/python3.9/site-packages/tyro/_cli.py", line 114, in cli _cli_impl( File ".../venv1/lib/python3.9/site-packages/tyro/_cli.py", line 293, in _cli_impl out, consumed_keywords = _calling.call_from_args( File ".../venv1/lib/python3.9/site-packages/tyro/_calling.py", line 192, in call_from_args return unwrapped_f(*args, **kwargs), consumed_keywords # type: ignore File ".../llm_rl_scripts/text_nav/bc/train_bc.py", line 259, in main trainer, inference = train_loop( File ".../JAXSeq/JaxSeq/train.py", line 218, in train_loop for batch in tqdm(d, total=steps_per_epoch): File ".../venv1/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__ for obj in iterable: File "/home/kariakinaleksandr/PycharmProjects/lmlr-gym/JAXSeq/JaxSeq/utils.py", line 437, in _iterable_data_to_batch_iterator for item in dataset: File ".../lmlr-gym/JAXSeq/JaxSeq/data.py", line 210, in __next__ in_tokens, in_training_mask = next(self.in_mask_tokens) File ".../JAXSeq/JaxSeq/data.py", line 248, in _tokens_generator in_training_mask = block_sequences( File ".../JAXSeq/JaxSeq/utils.py", line 240, in block_sequences return np.asarray(full_sequences, dtype=dtype) ValueError: could not convert string to float: '|'

20 Questions

In the llm_rl_scripts/twenty_questions/bc/train_bc.py script file unresolved reference 'train_text_histories'. Should it be train_text_trajectories?

I have not found the model which is suitable to be an oracle. Tried gpt2, it fails with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File ".../llm_rl_scripts/twenty_questions/bc/train_bc.py", line 316, in <module> tyro.cli(main) File ".../venv1/lib/python3.9/site-packages/tyro/_cli.py", line 114, in cli _cli_impl( File ".../venv1/lib/python3.9/site-packages/tyro/_cli.py", line 293, in _cli_impl out, consumed_keywords = _calling.call_from_args( File ".../venv1/lib/python3.9/site-packages/tyro/_calling.py", line 192, in call_from_args return unwrapped_f(*args, **kwargs), consumed_keywords # type: ignore File ".../llm_rl_scripts/twenty_questions/bc/train_bc.py", line 152, in main oracle=T5Oracle.load_oracle( File ".../llm_rl_scripts/twenty_questions/env/oracle.py", line 107, in load_oracle params, model = t5_load_params( File ".../JAXSeq/JaxSeq/models/T5/load.py", line 229, in load_params with open(os.path.join(model_load_path, 'config.json'), 'r') as f: File ".../JAXSeq/JaxSeq/bucket_manager.py", line 24, in open_with_bucket f = open(path, mode=mode, **kwargs) FileNotFoundError: [Errno 2] No such file or directory: 'gpt2/config.json'

Is this a model from a dataset link?

Please resolve this whole issue as soon as possible.

icwhite commented 6 months ago

Thank you for reaching out. We will address your concerns in full shortly. Here are some answers to your questions.

Maze

I ran python -m llm_rl_scripts.maze.bc.eval_bc PARAMS my_path --data-mesh-shape 4 --model-mesh-shape 2 --policy-n-rollouts 4 --no-do-accuracy-eval with the bc checkpoint linked here: https://rail.eecs.berkeley.edu/datasets/rl-llm-bench-dataset/maze/checkpoints/fully_observed/bc/

and I got an average reward of -75.36, so this seems to work. You are right that the policy used in the training script and the default in the evaluation script is different. In our evaluation in the paper, we used GPT2PPOPolicy, so we will update this.

In the meantime, if you are evaluating a checkpoint other than the one linked in the repository, I recommend that you run for more epochs. For the maze task, we trained our methods for 50-100 epochs for each algorithm since the dataset is relatively small. We will add this hyperparameter to an updated version of the paper.

Chess

Yes, when running with chess, we set 'TOKENIZERS_PARALLELISM=false'.
It looks like you ran out of memory. I would recommend using a smaller batch_size and setting grad_accum_steps to be higher.
Fixed. I added the starting position here.

20 Questions

Apologies for the confusion. Yes, that link is for the simulator. Please download it and then point 'oracle_model_path' at the path to the model.

We will get to the remaining issues shortly. Thank you for your patience!

PioneerAlexander commented 5 months ago

@icwhite Hello, are there any updates on the remaining issues?

icwhite commented 5 months ago

Hi, I have resolved the issues for 20 questions, and fully observed maze. We are still working on Text-Nav, Guess My City, and Car Dealer issues.

icwhite commented 5 months ago

We have resolved the car dealer issue. Please see recent merge.

abdulhaim / LMRL-Gym