YifeiZhou02 / ArCHer

Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
https://yifeizhou02.github.io/archer.io/
84 stars 10 forks source link

Some Question about tokenizer length #8

Closed xiaxiaxiatengxi closed 3 months ago

xiaxiaxiatengxi commented 3 months ago

Thank you for your response. I still have some questions regarding the updates to the QV network. I noticed that in DoubleCritic, our tokenizer has a truncation length of 512. How do we ensure this 512-length limit in the webshop environment? When I trained on the webshop, the length reached over 2000 after three rounds of conversation.

YifeiZhou02 commented 3 months ago

yea I think Roberta tokenizer cannot support over 2000 tokens. In that case, the context will be truncated from the start (so that the nearest round is not truncated).