YifeiZhou02 / ArCHer

Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
https://yifeizhou02.github.io/archer.io/
84 stars 10 forks source link

也许QV网络不太稳定? #7

Closed xiaxiaxiatengxi closed 3 months ago

xiaxiaxiatengxi commented 4 months ago

image

您好,我们有提供QV网络的权重么?我在实验室服务器上部署了webshop和Archer,利用我们提供的Checkpoint进行验证,发现前20轮Agent依然无法给出合理的输出。 在进行actor_loss的时候,程序无法给出合理的log_prob,在 archer_agetn.py的106行 outputs = self.model(input_ids=input_ids, attention_mask = attention_mask) 报错: RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checki ng arguments for embedding) ……

YifeiZhou02 commented 4 months ago

Hi,

Unfortunately we do not have saved QV networks. Do you mean that you were not able to successfully load the sft policy and the sft policy does not output reasonable actions when it is loaded?