In this PR, I provide full support of format conversion for mixtral-MoE architectures.
Converting mixtral-MoE (from either magnet source or huggingface source) to internal split and split-sparse consolidated checkpoints: accessory/tools/mixtral_moe_split_from_hf.py. Usage: python mixtral_moe_split_from_hf.py in-ckpt-dir output-ckpt-dir [--in_ckpt_source hf_or_magnet (default: hf)] [--convert_sparse (whether to convert to sparse format)] This functionality is a refactoring from https://huggingface.co/Alpha-VLLM/MoE-Mixtral-7B-8Expert/blob/main/converted/split.py and https://huggingface.co/Alpha-VLLM/MoE-Mixtral-7B-8Expert/blob/main/converted_sparse/split_sparse.py, but now it unifies the two scripts and supports huggingface checkpoint format.
Converting mixtral-MoE consolidated checkpoint to huggingface format: accessory/tools/convert_weights_to_hf.py. The usage is the same for mixtral-moe compared to that of llama, except that --mixtral is needed to identify it is a mixtral moe architecture. Note: we only support consolidated checkpoints of split format yet, rather than split_sparse which is a future work.
In this PR, I provide full support of format conversion for mixtral-MoE architectures.
Converting mixtral-MoE (from either magnet source or huggingface source) to internal split and split-sparse consolidated checkpoints:
accessory/tools/mixtral_moe_split_from_hf.py
. Usage:python mixtral_moe_split_from_hf.py in-ckpt-dir output-ckpt-dir [--in_ckpt_source hf_or_magnet (default: hf)] [--convert_sparse (whether to convert to sparse format)]
This functionality is a refactoring fromhttps://huggingface.co/Alpha-VLLM/MoE-Mixtral-7B-8Expert/blob/main/converted/split.py
andhttps://huggingface.co/Alpha-VLLM/MoE-Mixtral-7B-8Expert/blob/main/converted_sparse/split_sparse.py
, but now it unifies the two scripts and supports huggingface checkpoint format.Converting mixtral-MoE consolidated checkpoint to huggingface format:
accessory/tools/convert_weights_to_hf.py
. The usage is the same for mixtral-moe compared to that of llama, except that--mixtral
is needed to identify it is a mixtral moe architecture. Note: we only support consolidated checkpoints of split format yet, rather than split_sparse which is a future work.