Shark-NLP / DiffuSeq

[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
MIT License
711 stars 87 forks source link

RoFormer model instead of BERT? #51

Closed mainpyp closed 1 year ago

mainpyp commented 1 year ago

Hi there, have you tried to use the Roformer model for text generation? I want to use that, since it allows the capturing of relative positions of each word. If you have tried it, do i have to change anything other than just loading the different model because right now, my generation is way worse than BERT what is counterintuitive! :)

mainpyp commented 1 year ago

Just replaced the BERT with Roformer config and it worked. Had an additional Position Embedding that messed with the training in the first place!

xxiutong commented 7 months ago

have you tried on the baselines? How's the results?