arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.53k stars 399 forks source link

Support for xlm-roberta #422

Open umiron opened 2 hours ago

umiron commented 2 hours ago

Is it possible to add support for xlm-roberta? It's the same architecture as roberta, except for a larger vocabulary since it is multi-lingual.

metric-space commented 2 hours ago

Hey @umiron I believe there isn't anything within mergekit that is a barrier to inter- xlm-roberta related merges as the architecture format is tensor size oblivious.

If this really matches up with the xlm-roberta weight names and architecture, add the architecture name (XLMRobertaForMaskedLM) here locally and test to see if it works