MIV-XJTU / ARTrack

Apache License 2.0
228 stars 33 forks source link

Could you provide some suggestions for parameters-efficient fine-tuning on ARTrack? #66

Closed faicaiwawa closed 5 months ago

faicaiwawa commented 5 months ago

Thanks for the open source, your project performance is excellent!

We want to use ARTrack as a baseline for PEFT exploration, but it does not go so well. We experiment with OSTrack and ARTrack as baselines, respectively. On OSTrack we freeze the parameters of the baseline and fine-tune the newly added parameters at their original raw learning rate, which works quite well. But the training of ARTrack is more complex, we try to freeze the original weights of the network and keep the two-stage training strategy unchanged to fine-tune the newly added parameters, but the performance degradation is obvious.

  1. should we skip the first stage of training and just do the second stage of fine-tuning?
  2. should we unfrozen some key raw parameters, such as word_embedding and out_put bias?
AlexDotHam commented 5 months ago

I think you should skip the first stage of training because the weight we provide is trained through second stage, which is familiar with the trajectory we proposed. Then, I am not sure about the PEFT, I think it may be an adapter or LoRA I guess. If the structure greatly influences the output feature, I think you should set the CE loss's weight from zero to 2, and then increase the lr from 4e-6 to 8e-5. If you have any other questions you can email me or add my WeChat in my github homepage.

faicaiwawa commented 5 months ago

我认为你应该跳过第一阶段的训练,因为我们提供的重量是通过第二阶段训练的,这与我们提出的轨迹很熟悉。然后,我不确定 PEFT,我认为它可能是适配器或 LoRA。如果结构对输出特征有很大影响,我认为您应该将 CE 损失的权重从 0 设置为 2,然后将 lr 从 4e-6 增加到 8e-5。如果您有任何其他问题,可以给我发电子邮件或在我的 github 主页中添加我的微信。

Thank you for your very timely reply, I will try it based on your suggestions!