Open tomgm777 opened 8 months ago
P.S I have already made the following modifications. https://github.com/arcee-ai/mergekit/issues/139#issuecomment-1925187686
Thanks for reporting this, I'll see if I can replicate it and figure out what's going on. It might be because calm2-7b seems to be using a different tokenizer class than base llama2. I hadn't tried that particular combination of things yet.
Hello. The same phenomenon was confirmed with the 70B model. (Since I checked after quantization, there is a possibility that quantization failed...) base model https://huggingface.co/karakuri-ai/karakuri-lm-70b-v0.1 merge model https://huggingface.co/NeverSleep/MiquMaid-v2-70B-DPO With this other merge model https://huggingface.co/sophosympatheia/Midnight-Miqu-70B-v1.0 https://huggingface.co/sophosympatheia/Midnight-Rose-70B-v2.0.3 https://huggingface.co/152334H/miqu-1-70b-sf All of the above responses will be incomprehensible or silent. However, only https://huggingface.co/ChuckMcSneed/Gembo-v1-70b The merge with is successful and the response is normal. I think they are all merges of the same type, but what is the difference? yaml is the same as mentioned above.
I think you're actually seeing a different issue with most of those merges. Miqu and its derivatives are not directly compatible with Llama 2 based models - they use a different rope_theta
value. So similar to merging a CodeLlama based model with base Llama, it's expected that merging them will result in unintelligible output.
Midnight Rose should work though - it's odd if that is having the same problem.
sorry. There was a mistake in midnightroze's merge settings. After fixing it, I was able to respond normally.
sorry. There was a mistake in midnightroze's merge settings. After fixing it, I was able to respond normally.
What was the mistake that you fixed?
Wow! Are you the author of the Midnight series? Midnight Rose and Karakuri Merge gave great results. thank you!
It was an embarrassing mistake to load midnight-miqu instead of midnight-rose. (><) The midnight-rose merge worked correctly as well as gembo70B.
However, what I was concerned about was dreamgen/opus-v1.2-70b The problem is that the results were good even though the values of rope_theta were different like miqu. (Need to set tokenizer_source: base. Union resulted in an error)
Hello. I used mergekit to merge various models, but in the case below, the merge process completes normally without any errors, but when I play it back with text-generation-webui, it outputs an incomprehensible answer.
https://huggingface.co/cyberagent/calm2-7b (tokenizer.json use) https://huggingface.co/TheTravellingEngineer/llama2-7b-chat-hf-dpo (tokenizer.model use)
Below is my yaml.
command option --cuda --allow-crimes --trust-remote-code
I tried it with other models, but this problem occurs when the base model is a model that uses tokenizer.json and the model to be merged uses only tokenizer.model. Also, tokenizer_source: union If I specify , mergekit processing seems to freeze.
Do you know of any countermeasures regarding this matter?