Shark-NLP / DiffuSeq

[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
MIT License
728 stars 88 forks source link

Size of the hidden dimension #52

Closed mainpyp closed 1 year ago

mainpyp commented 1 year ago

Hey, I was wondering if you have tested the effect of the hidden dimension on the training, and if yes, what were your findings?

summmeer commented 1 year ago

Hi, Yes, we've tested the $h=128$ and $h=768$ and we found that the larger $h$ can not guarantee better performance. There are many factors anyway, but so far this hyper-parameter does not take much effect in these settings. As for more training samples or more complex tasks or more Transformer layers, it's another story.

mainpyp commented 1 year ago

Thank you for you reply! :)