about decoding topk_masking

HKUNLP / reparam-discrete-diffusion

Reparameterized Discrete Diffusion Models for Text Generation

Apache License 2.0

90 stars 3 forks source link

Thanks for your excellent work. I have a question about the rate schedule for topk_masking.

As described in the appendix, "To ensure that the degree of noise decreases as the generation process proceeds, we schedule k to increase from 1 to N monotonically as the diffusion step t goes from T to 1." However, in the code (https://github.com/HKUNLP/reparam-discrete-diffusion/blob/26ee286b281edc6284d74f809465b3e6d42507a6/discrete_diffusion/discrete_diffusions/discrete_diffusion_base.py#L177), the masked k tokens with the lowest confidence instead of the highest. Are there any inconsistencies here?

Best regards

HKUNLP / reparam-discrete-diffusion