Thanks for this amazing work! The confusion about the paper when I read is that how to use the ensemble trick. In my opinion, ensembles of mutiple models means training with mutiple independent M2 transformers with different seeds and average the final predictions during inferring.
Thanks for this amazing work! The confusion about the paper when I read is that how to use the ensemble trick. In my opinion, ensembles of mutiple models means training with mutiple independent M2 transformers with different seeds and average the final predictions during inferring.