HKUNLP / reparam-discrete-diffusion

Reparameterized Discrete Diffusion Models for Text Generation
Apache License 2.0
90 stars 2 forks source link

About argmax decoding #2

Open HyezNee opened 8 months ago

HyezNee commented 8 months ago

Hi, First I really appreciated for your nice works. I want to ask the inquiry about the sampling code.

In the RDMs paper, line 9 in pseudo code of 'Sampling from RDMs' says that Draw xe0,n ∼ Categorical(f(xt,n;θ)/τ); However, in the code, I guess the performance of model can be reproduced when adding --argmax-decoding and it is different from the description. Is it true that you turn on argmax-decoding mode when you do sampling?

LZhengisme commented 8 months ago

Hi, thanks for being interested in the work!

We provided scripts to reproduce the experiment results of RDMs in fairseq/experiments, where argmax-decoding = True is used for machine translation (here) and temperature = 0.3 for question generation and paraphrasing tasks (here). We also found using a low temperature like 0.1 or 0.2 could achieve similar results to argmax-decoding for translation tasks, although there may be some fluctuations.

We adopt the sampling formulation in the pseudo-code as the argmax case can be included in the formulation when the temperature approaches 0, wherein the distribution would become a point mass on the token with the highest probability and sampling would be equivalent to taking the argmax.