YifeiZhou02 / ArCHer

Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
https://yifeizhou02.github.io/archer.io/
108 stars 13 forks source link

llama2-7B训练webshop效果越来越差了 #5

Closed xiaxiaxiatengxi closed 7 months ago

xiaxiaxiatengxi commented 7 months ago

您好,不好意思打扰到您。 我用我们的代码去训练webshop,效果变的越来越差。 我们先用2000条webshop数据训练了一个LoRa,之后在这个LoRa基础上训练llama2-7B。 我们的测试方法是:用200个webshop对话做测试,测试metrics是ngrams(n=2),初始化的LoRa得分是129.177,训练2000次迭代后68.546。训练参数是 `# Adversarial Attack Config defaults:

YifeiZhou02 commented 7 months ago

Thanks for your interest. We sometimes went into such issues with poor hyperparameters. To stablize training, it might be a good idea to consider decreasing the learning rate for the actor (we used 1e-5 for gpt2 actor so I imagine that might be too large for llama2 7b) and increasing the rollout_size (in general the larger the rollout_size the more stable the algorithm is). The sign of a well-chosen set of hyperparameters is that the metrics q1_mean and q2_mean improve stably.