Encoder only work - Githubissues

We already implemented encoder only models but it would be great to make a recipe that fits well-known models like xlm-roberta-xl (and xxl)

xlm-roberta-large is post norm so it would require an additional change in the transformer arch.

for the 2 XL/XXL there is still some work to be done which is the learned position encoding embeddings. (see also #17)

supporting those 2 models would also make things compatible with COMET models ......

eole-nlp / eole