arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.88k stars 446 forks source link

Fixed the YML/YAML documentation for Qwen MoE creation #435

Open Nottlespike opened 1 month ago

Nottlespike commented 1 month ago

At the very least when using the new Qwen2.5 models that are still the Qwen2 architecture when making the YAML file for the MoE the architecture line needs to be verbatim architecture: Qwen MoE otherwise it won't accurately detect the Qwen models.

cg123 commented 1 month ago

Thanks for the doc fix! It looks like you're also bringing in the changes from multi-module-architecture though - that isn't quite ready to merge into main. Could you please base this off of main?