Open gmftbyGMFTBY opened 1 year ago
This observation is correct. The model is using a lot more negative tokens than it should be. Fixing the issue to only include negative candidates from the i
th sample gives fairly modest improvements in my experience. I tried using (token level) unlikelihood objective on sequence length 256 on WikiText-103 with batch size 256 for 40k steps. To study text quality, I sample a prefix of length 32 from the test set and then generate the continuation with up to 128 tokens.
Model | ppl ($\downarrow$) | seq-rep-4 ($\downarrow$) | uniq ($\uparrow$) | mauve ($\uparrow$) |
---|---|---|---|---|
MLE | 18.87 | 0.554 | 11.5k | 0.956 |
UL (repo) | 19.76 | 0.216 | 15.4k | 0.988 |
UL (corrected) | 19.35 | 0.406 | 13.7k | 0.961 |
Human (from paper) | - | 0.005 | 18.9k | 1.000 |
Interestingly, using a lesser number of negatives resulted in perplexity better than the repo version. The paper doesn't really go in-depth on perplexity either so I'm not sure which method (repo or corrected) is better. I think it is still worth using the objective but the gains in generation quality may not be as high as you might expect.
Hello, thank you for your wonderful work!
After carefully analyzing the token-level unlikelihood training loss, I think the batch-version unlikelihood training loss is different from the one defined in the paper.
In your paper, the negative candidates should be the context of the current token:
But in your code, I notice that you simply flat all the tokens in a batch (may consist of
N
samples): https://github.com/facebookresearch/unlikelihood_training/blob/main/custom/candidate_penalty_ce_loss.py#L55If the batch size if 1, the code is consistent with the definition in the paper. But if the batch size is larger than 1, the negative candidate of the sample, for example, the sample
i>0
, its negative candidates not only contains the previous tokens in samplei
but also contains all the tokens in previous samplesj<=i
. Thus, in this case, the negative candidates are much larger.Am I right? Looking forward to your response.
Sincerely.
Tian Lan