YifeiZhou02 / ArCHer

Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
https://yifeizhou02.github.io/archer.io/
108 stars 13 forks source link

Model Training Unstable(webshop,gpt2) #14

Open RobertXWL opened 2 months ago

RobertXWL commented 2 months ago

I encountered an issue while trying to reproduce the results by loading the gpt2_bc_webshop_history.pt model and running the run.py script. The training was initiated with the following parameters:

8*GPUS """ epochs=50 actor_epochs=3 batch_size=8 grad_accum_steps=4 capacity=10000 critic_lr=6e-5 lm_lr=3e-5 rollout_size=512 gamma=0.9 tau=0.1 agent_type="archer" webshop_lower: 2000 webshop_upper: 2100 """

However, I noticed that during training, the eval_rollout.mean value barely increases, and in many cases, the training either crashes (with rewards becoming zero) or the performance deteriorates. To mitigate the issue, I tried lowering the learning rate and reducing the number of actor updates, which seemed to prevent the model from crashing.

I would like to understand the potential reason for this behavior and whether my parameter settings are appropriate. Could you help clarify if I am missing something or suggest adjustments to make the training more stable?

YifeiZhou02 commented 2 months ago

Hi, thanks for your interest in our work. Have you tried using the provided hyperparameters (https://github.com/YifeiZhou02/ArCHer/blob/master/scripts/config/archer_webshop.yaml)? In general a smaller learning rate, larger gradient accumulation, and rollout size will make the training more stable. Please allow one or two days of running to be able to see improvements.