Closed SVT-Yang closed 6 days ago
Thanks for your response. I have other 2 questions regarding the paper details.
Thank you very much for your time and assistance.
L_cap and L_ret are trained using different forwards.
The stacking mechanism is implemented by modifying attention masks and the position indices of [RET] tokens, with no other changes.
Got it. Thanks~
Come on, it's almost September!!
Apologies for the delay. I'm currently busy with another project. I'll organize and release the code later. I expect to have everything ready in about two weeks.
Thank you for your understanding!