Can Models with Different vocab_size be Merged?

ZeroYuJie commented 8 months ago

Great job for this toolkit .

I'm attempting to merge two models with differing vocab_size: augmxnt/shisa-7b-v1 (base) and teknium/OpenHermes-2.5-Mistral-7B. The augmxnt/shisa-7b-v1 model has an expanded vocab_size. However, after merging them using dare_ties, the output of the entire model becomes garbled. Could this be related to my setting of tokenizer_source to union? If I don't set it, I encounter an error:

RuntimeError: The size of tensor a (32000) must match the size of tensor b (120128) at non-singleton dimension 0

Is there a way to successfully merge these models despite their different vocab_size?

Thanks for your help!

cg123 commented 8 months ago

Glad you've found it useful!

In principle tokenizer_source: union should be doing what you want here. It is a pretty experimental feature and I wouldn't be surprised if you've hit upon a bug in it. I'm on vacation at the moment so probably won't have time to really dig into this for a while, but in the meantime, are you using the embed_slerp parameter? If you are try with it toggled off, or vice versa try with it enabled if you aren't using it now.

In either case I'll look into this and get a proper fix in once I'm back - thanks for filing the issue!

NilanEkanayake commented 8 months ago

I was doing the exact same merge, ended up using stabilityai/japanese-stablelm-base-gamma-7b

I wanted shisa for the strong Japanese language ability, and OpenHermes for the natural language of its translations, not dry like most other models.

ZeroYuJie commented 8 months ago

Glad you've found it useful!

In principle tokenizer_source: union should be doing what you want here. It is a pretty experimental feature and I wouldn't be surprised if you've hit upon a bug in it. I'm on vacation at the moment so probably won't have time to really dig into this for a while, but in the meantime, are you using the embed_slerp parameter? If you are try with it toggled off, or vice versa try with it enabled if you aren't using it now.

In either case I'll look into this and get a proper fix in once I'm back - thanks for filing the issue!

Thank you for your prompt reply, especially while you're on vacation! I appreciate the suggestion regarding the embed_slerp parameter. I'll give it another try with your advice in mind.

I wish you a wonderful and relaxing holiday. Looking forward to your insights when you're back.

ZeroYuJie commented 8 months ago

I was doing the exact same merge, ended up using stabilityai/japanese-stablelm-base-gamma-7b

I wanted shisa for the strong Japanese language ability, and OpenHermes for the natural language of its translations, not dry like most other models.

Good job. How effective was the merge with stabilityai/japanese-stablelm-base-gamma-7b ? Could you share your merge script? I'd like to replicate the merge to see the specific results. Since augmxnt/shisa-7b-v1 has expanded vocab_size , it might yield better results, so I'm also keen to solve the issue of merging models with different vocab_size.

NilanEkanayake commented 8 months ago

I was doing the exact same merge, ended up using stabilityai/japanese-stablelm-base-gamma-7b I wanted shisa for the strong Japanese language ability, and OpenHermes for the natural language of its translations, not dry like most other models.

Good job. How effective was the merge with stabilityai/japanese-stablelm-base-gamma-7b ? Could you share your merge script? I'd like to replicate the merge to see the specific results. Since augmxnt/shisa-7b-v1 has expanded vocab_size , it might yield better results, so I'm also keen to solve the issue of merging models with different vocab_size.

It was better than I expected, but also not as good as I think it could be. The language use in the translations became more diverse, but errors that weren't present in OpenHermes' translations cropped up. Overall a win. With refined prompting and having another LLM as editor, I think I could get some amazing results. This merge is pretty picky about prompts, though.

Here's the config I used:

models:
  - model: stabilityai/japanese-stablelm-base-gamma-7b
    # no parameters necessary for base model
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      density: 0.7
      weight: [0.1, 0.3, 0.6, 0.7] # weight gradient
merge_method: ties
base_model: stabilityai/japanese-stablelm-base-gamma-7b
parameters:
  normalize: true
  int8_mask: true
dtype: bfloat16

I was also doing testing with stabilityai/japanese-stablelm-instruct-beta-70b, and with a few-shot prompt for Japanese-English translation, it blows GPT4 out of the water.

choprahetarth commented 3 months ago

I am struggling with a similar problem, however I haven't been able to fix it. I do have two models, trying to merge them using TIES and they have different vocabularies. Even though (I THINK) the tokenizer merge is passing through, but there is an error which points at the mismatch of the embedding layer sizes (I haven't been able to exclude the embedding layer).

https://github.com/arcee-ai/mergekit/issues/342

Can anyone see this?

arcee-ai / mergekit

Can Models with Different vocab_size be Merged? #47