Open NeuroDonu opened 5 months ago
Hey! This is because the gate_mode
you set isn't valid - it should be gate_mode: hidden
instead of gate_mode: hide
. Also, you don't need to set a merge_method
for mergekit-moe
.
Wow! Damn, I didn't know. I put together the configuration for a couple of hours at random and still got an unusable model. Well then, I’ll compose it again tomorrow, thank you!
Glad to help!
One other thing I just noticed - aiXcoder/aixcoder-7b-base isn't actually a mistral model, so it won't be compatible with the others you have selected. That is probably causing problems as well.
Yes, yes, this was the problem with recording layers, as a result of which I ruined the model.
Hi guys! I create MoE models with this config and this notebook in gcolab halfway through the merge, I get an error: Warm up loaders: 0% 0/5 [00:00<?, ?it/s] Fetching 11 files: 100% 11/11 [00:00<00:00, 24411.29it/s] Warm up loaders: 20% 1/5 [00:00<00:00, 6.52it/s] Fetching 11 files: 100% 11/11 [00:00<00:00, 20726.57it/s] Warm up loaders: 40% 2/5 [00:00<00:00, 6.02it/s] Fetching 8 files: 100% 8/8 [00:00<00:00, 66708.61it/s] Warm up loaders: 60% 3/5 [00:00<00:00, 4.62it/s] Fetching 12 files: 100% 12/12 [00:00<00:00, 28777.39it/s] Warm up loaders: 80% 4/5 [00:00<00:00, 5.05it/s] Fetching 10 files: 100% 10/10 [00:00<00:00, 8967.94it/s] Warm up loaders: 100% 5/5 [00:00<00:00, 5.48it/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. You set
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(args, *kwargs)
File "/content/mergekit/mergekit/options.py", line 78, in wrapper
f(args, **kwargs)
File "/content/mergekit/mergekit/scripts/mixtral_moe.py", line 463, in main
build(
File "/content/mergekit/mergekit/scripts/mixtral_moe.py", line 378, in build
gate_vecs = get_gate_params(
File "/content/mergekit/mergekit/scripts/mixtral_moe.py", line 175, in get_gate_params
hidden_states = _do_it(tokenize_prompts(expert.positive_prompts, tokenizer))
TypeError: 'NoneType' object is not callable
add_prefix_space
. The tokenizer needs to be converted from the slow tokenizers Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. expert prompts: 0% 0/4 [00:00<?, ?it/s] Traceback (most recent call last): File "/usr/local/bin/mergekit-moe", line 8, inhow can it be repaired?