Question Regarding the Maximum Length for the MAUVE Evaluaiton

bloomberg / MixCE-acl2023

Implementation of MixCE method described in ACL 2023 paper by Zhang et al.

Apache License 2.0

19 stars 3 forks source link

Hello,

I have a few questions while replicating the numbers using the provided checkpoints.

I wonder how, in Tables 2 and 3, what are the max_length.

So far, I have only tested the WikiText checkpoint trained with MLE. The observation is that the MAUVE scores are quite different from the tables and are heavily dependent on the max_length for the evaluation. In addition, the generated samples are much shorter than human references. My settings: max length for generation is set to 512 + prompt_len; top-P is set to 0.9.
How do we solve the "ERROR: Can't get enough samples!" error when evaluating c-MAUVE?

Thanks for making the code public.

bloomberg / MixCE-acl2023