Closed keeganstoner closed 1 year ago
BART is tested. I think you just have to turn off apply_model_parallel: False as done in https://github.com/allenai/RL4LMs/blob/main/scripts/training/task_configs/synthetic_generate_increasing_numbers/bart_ppo.yml
I'm still getting the same issue after turning that off, even with the exact yml file bart_ppo.yml
It does work for causal_lm_actor_critic_policy
bart_ppo.yml works with latest version of RL4LMs. Make sure you are using the v0.2.1.
Ah wasn't updated, thanks.
I'm getting this error when trying to finetune Bart using PPO. Is this because BART isn't fully implemented yet, or because I'm using a wrong model?
My yml looks like this: (it's the default ppo one with only the model changed)
tokenizer: model_name: facebook/bart-large padding_side: left truncation_side: left pad_token_as_eos_token: False
reward_fn: id: rouge args: rouge_type: "rouge1"
datapool: id: cnn_daily_mail args: prompt_prefix: "Summarize: "
env: n_envs: 10 args: max_prompt_length: 512 max_episode_length: 100 terminate_on_eos: True prompt_truncation_side: "right" context_start_token: 0
alg: id: ppo args: n_steps: 512 batch_size: 64 verbose: 1 learning_rate: 0.000002 n_epochs: 5 ent_coef: 0.0 kl_div: coeff: 0.001 target_kl: 0.2 policy: id: seq2seq_lm_actor_critic_policy args: model_name: facebook/bart-large-cnn apply_model_parallel: True prompt_truncation_side: "right" generation_kwargs: do_sample: True top_k: 50 min_length: 50 max_new_tokens: 100
train_evaluation: eval_batch_size: 100 n_iters: 100 eval_every: 10 save_every: 1 metrics:
- id: bleurt
args:
config_name: bleurt-large-512
- id: summaCZS
args:
granularity: sentence
use_ent: True
use_con: False
- id: summaCConv
args:
granularity: sentence
generation_kwargs: do_sample: True top_k: 0 temperature: 0.7 min_length: 50 max_new_tokens: 100