Closed vince62s closed 3 weeks ago
We already implemented encoder only models but it would be great to make a recipe that fits well-known models like xlm-roberta-xl (and xxl)
xlm-roberta-large is post norm so it would require an additional change in the transformer arch.
for the 2 XL/XXL there is still some work to be done which is the learned position encoding embeddings. (see also #17)
supporting those 2 models would also make things compatible with COMET models ......
We already implemented encoder only models but it would be great to make a recipe that fits well-known models like xlm-roberta-xl (and xxl)
xlm-roberta-large is post norm so it would require an additional change in the transformer arch.
for the 2 XL/XXL there is still some work to be done which is the learned position encoding embeddings. (see also #17)
supporting those 2 models would also make things compatible with COMET models ......