Open RobertXWL opened 2 months ago
Hi, thanks for your interest in our work. Have you tried using the provided hyperparameters (https://github.com/YifeiZhou02/ArCHer/blob/master/scripts/config/archer_webshop.yaml)? In general a smaller learning rate, larger gradient accumulation, and rollout size will make the training more stable. Please allow one or two days of running to be able to see improvements.
I encountered an issue while trying to reproduce the results by loading the gpt2_bc_webshop_history.pt model and running the run.py script. The training was initiated with the following parameters:
8*GPUS """ epochs=50 actor_epochs=3 batch_size=8 grad_accum_steps=4 capacity=10000 critic_lr=6e-5 lm_lr=3e-5 rollout_size=512 gamma=0.9 tau=0.1 agent_type="archer" webshop_lower: 2000 webshop_upper: 2100 """
However, I noticed that during training, the eval_rollout.mean value barely increases, and in many cases, the training either crashes (with rewards becoming zero) or the performance deteriorates. To mitigate the issue, I tried lowering the learning rate and reducing the number of actor updates, which seemed to prevent the model from crashing.
I would like to understand the potential reason for this behavior and whether my parameter settings are appropriate. Could you help clarify if I am missing something or suggest adjustments to make the training more stable?