Shark-NLP / DiffuSeq

[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
MIT License
728 stars 88 forks source link

Machine Translation Task with DiffuSeq #74

Open chiral-carbon opened 8 months ago

chiral-carbon commented 8 months ago

Hi @summmeer,

I was wondering how I might go about implementing a machine translation task with DiffuSeq. I have trained DiffuSeq for the paraphrase task, but I want to be able to use it for translation tasks. Would supplying a translation dataset to the existing codebase (since it designed for seq2seq tasks) suffice or would further changes be required?

Would appreciate any advice, thanks!

summmeer commented 8 months ago

Hi, You can have a try. But different hyper-parameters may lead to different results, including bsz, steps, dim, seq_len, and tokenizers. Currently many follow-up works achieve better MT performance and you can refer to their codebase, too.

chiral-carbon commented 8 months ago

Yeah makes sense, thanks! Are you referring to works like SeqDiffuSeq which builds on DiffuSeq directly?

summmeer commented 8 months ago

It depends on what your goal is using diffusion model for MT tasks. Follow-up works are not exactly the same with DiffuSeq. SeqDiffuSeq is based on encoder-decoder architecture, while RDM is based on discrete text diffusion. This work also involves pre-trained MLMs. If you're aiming the performance, you could refer to the SOTA model.

chiral-carbon commented 8 months ago

@summmeer thanks, this is very helpful! in the paper DiNoiSer, the authors claim to have surpassed DiffuSeq's performance on the WMT14 EN->DE task, so I wanted to do a similar comparison between DiffuSeq and DiNoiSer on the IWSLT14 task, but DiffuSeq takes a long time to train. Even with the QQP task reported in the paper, I tried training it to replicate the results and on 4 A100 GPUs it took 6.5 days to train (WandB overview), so do you think there is additional distributed training code required to train DiffuSeq more efficiently?

Sorry for the trivial question, your replies are really helpful, thanks!

summmeer commented 8 months ago

Hi, Maybe you can try our updated version 2, which is 4x faster on training and 800x faster on sampling on QQP datasets. [We update the information of v2 in README.md]

chiral-carbon commented 8 months ago

I will, thanks a lot!