Open WebCloud opened 7 months ago
@WebCloud Bumblebee needs implementation for each model type in order to load it. Mixtral is not implemented currently, while Mistral is.
I see! Well, I am quite fresh on Elixir, but I'd be happy to help however I can.
For an example of adding a model you can see Mistral #264. The corresponding hf/transformers code is modeling_mixtral.py.
However, note that Nx/EXLA doesn't support quantization yet, and the bf16 model is around 100GB, so it is not very practical for running on the GPU at this point.
Cool! thanks for the resources!
It seems that bumblebee is not capable of loading Mixtral-8x7B models (base or instruct). I've checked the files and it should be able to load the model (in theory) since it is capable of loading Mistral-7b files, but I keep getting
Both configuration files have the same reference to MixtralForCausalLM and Mixtral-8x7B has the safetensors filers.