Reproducing `car-dealer` results

I'm trying to reproduce the car-dealer results using the commands in llm_rl_scripts/car_dealer/misc/test_car_dealer.sh. I'm unable to train BC with the code. There were syntax issues with the code in the main branch which I believe I corrected in PR #18. Even after applying those changes, I had to remove --model-p-shape=4 from the BC command as that is an unrecognized argument; not sure how important that is.

Still, the code quickly ends with:

Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/scratch1/ap/repos/LMRL-Gym/llm_rl_scripts/car_dealer/bc/train_bc.py", line 386, in <module>
    tyro.cli(main)
  File "/usr/local/lib/python3.9/dist-packages/tyro/_cli.py", line 114, in cli
    _cli_impl(
  File "/usr/local/lib/python3.9/dist-packages/tyro/_cli.py", line 293, in _cli_impl
    out, consumed_keywords = _calling.call_from_args(
  File "/usr/local/lib/python3.9/dist-packages/tyro/_calling.py", line 192, in call_from_args
    return unwrapped_f(*args, **kwargs), consumed_keywords  # type: ignore
  File "/scratch1/ap/repos/LMRL-Gym/llm_rl_scripts/car_dealer/bc/train_bc.py", line 363, in main
    trainer, inference = train_loop(
TypeError: train_loop() got an unexpected keyword argument 'eval_at_beginning'
{'model_load_mode': <ModelLoadMode.HF: 'hf'>, 'model_name': '/scratch1/ap/repos/LMRL-Gym/workdir/lm_models/gpt2-large', 'buyer_model_path': '/scratch1/ap/repos/LMRL-Gym/datasets/car-dealer/simulator/model', 'exp_name': 'car-dealer-bc', 'role': <Role.SELLER: 'SELLER'>, 'outputs_path': '/scratch1/ap/repos/LMRL-Gym/workdir/car-dealer/outputs/', 'top_p': None, 'checkpoint_path': None, 'checkpoint_is_sharded': True, 'data_path': '/scratch1/ap/repos/LMRL-Gym/datasets/car-dealer/', 'use_wandb': False, 'wandb_project': 'car-dealer-bc', 'do_pjit': True, 'data_mesh_shape': 1, 'fsdp_mesh_shape': 1, 'model_mesh_shape': -1, 'epochs': 18, 'max_steps': None, 'eval_batches': None, 'use_adafactor': False, 'train_bsize': 16, 'gradient_checkpoint': True, 'bf16_activations': False, 'eval_every_steps': 256, 'eval_every_epochs': None, 'eval_at_beginning': False, 'eval_at_end': True, 'save_every_steps': None, 'save_every_epochs': None, 'save_at_beginning': False, 'save_at_end': False, 'save_best': True, 'max_checkpoints': None, 'save_train_state': True, 'save_bf16': True,'force_pad_embeddings': False, 'log_every': None, 'num_logs_per_epoch': 4, 'eval_every': None, 'num_evals_per_epoch': 4, 'save_every': None, 'num_saves_per_epoch': 1, 'save_best_also': True, 'save_last': True, 'inference_bsize': 32, 'seed': 0, 'should_restore_loop_state': False, 'gcloud_project': None, 'gcloud_token': None, 'bf16_momentum': False, 'end_lr': 0.0001, 'grad_accum_steps': 8, 'init_lr': 0.0, 'lr': 0.0001, 'lr_decay_steps': 2, 'lr_warmup_steps': 1, 'max_sequence_length': 1024, 'multiply_by_parameter_scale': True, 'policy_bsize': 1, 'policy_do_sample': False, 'policy_max_input_length': 1024, 'policy_max_output_length': 1024, 'policy_n_rollouts': 1, 'policy_num_beams': 1, 'policy_temperature': 1.0, 'policy_top_k': 0, 'policy_top_p': 1.0, 'weight_decay': 0.0}
set pad_token
filepath: /scratch1/ap/repos/LMRL-Gym/datasets/car-dealer/train.json
Initial dataset sizes: train: 4399, eval: 450
Final dataset sizes: train: 4393, eval: 449
Padding embeddings from size 50257 to size 65536. Tokenizer vocab size 50258.
loading trainer and inference
load environment
make[2]: Leaving directory '/scratch1/ap/repos/LMRL-Gym'

and it doesn't generate any output or checkpoint. Where is the BC finetuned model being trained then?

abdulhaim / LMRL-Gym

Reproducing `car-dealer` results #19