arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.5k stars 392 forks source link

MoE Merge crashed #266

Open NeuroDonu opened 5 months ago

NeuroDonu commented 5 months ago

Hi guys! I create MoE models with this config and this notebook in gcolab halfway through the merge, I get an error: Warm up loaders: 0% 0/5 [00:00<?, ?it/s] Fetching 11 files: 100% 11/11 [00:00<00:00, 24411.29it/s] Warm up loaders: 20% 1/5 [00:00<00:00, 6.52it/s] Fetching 11 files: 100% 11/11 [00:00<00:00, 20726.57it/s] Warm up loaders: 40% 2/5 [00:00<00:00, 6.02it/s] Fetching 8 files: 100% 8/8 [00:00<00:00, 66708.61it/s] Warm up loaders: 60% 3/5 [00:00<00:00, 4.62it/s] Fetching 12 files: 100% 12/12 [00:00<00:00, 28777.39it/s] Warm up loaders: 80% 4/5 [00:00<00:00, 5.05it/s] Fetching 10 files: 100% 10/10 [00:00<00:00, 8967.94it/s] Warm up loaders: 100% 5/5 [00:00<00:00, 5.48it/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. expert prompts: 0% 0/4 [00:00<?, ?it/s] Traceback (most recent call last): File "/usr/local/bin/mergekit-moe", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "/content/mergekit/mergekit/options.py", line 78, in wrapper f(args, **kwargs) File "/content/mergekit/mergekit/scripts/mixtral_moe.py", line 463, in main build( File "/content/mergekit/mergekit/scripts/mixtral_moe.py", line 378, in build gate_vecs = get_gate_params( File "/content/mergekit/mergekit/scripts/mixtral_moe.py", line 175, in get_gate_params hidden_states = _do_it(tokenize_prompts(expert.positive_prompts, tokenizer)) TypeError: 'NoneType' object is not callable

how can it be repaired?

cg123 commented 5 months ago

Hey! This is because the gate_mode you set isn't valid - it should be gate_mode: hidden instead of gate_mode: hide. Also, you don't need to set a merge_method for mergekit-moe.

NeuroDonu commented 5 months ago

Wow! Damn, I didn't know. I put together the configuration for a couple of hours at random and still got an unusable model. Well then, I’ll compose it again tomorrow, thank you!

cg123 commented 5 months ago

Glad to help!

One other thing I just noticed - aiXcoder/aixcoder-7b-base isn't actually a mistral model, so it won't be compatible with the others you have selected. That is probably causing problems as well.

NeuroDonu commented 5 months ago

Yes, yes, this was the problem with recording layers, as a result of which I ruined the model.