arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.85k stars 442 forks source link

Support for xlm-roberta #422

Open umiron opened 2 months ago

umiron commented 2 months ago

Is it possible to add support for xlm-roberta? It's the same architecture as roberta, except for a larger vocabulary since it is multi-lingual.

metric-space commented 2 months ago

Hey @umiron I believe there isn't anything within mergekit that is a barrier to inter- xlm-roberta related merges as the architecture format is tensor size oblivious.

If this really matches up with the xlm-roberta weight names and architecture, add the architecture name (XLMRobertaForMaskedLM) here locally and test to see if it works

umiron commented 2 months ago

Thanks, @metric-space. This works well (except in my case the change was to this file, since the relevant architecture was XLMRobertaModel rather than XLMRobertaForMaskedLM).