-
https://hf-mirror.com/Qwen/Qwen2-57B-A14B-Instruct
-
-
### 🚀 The feature, motivation and pitch
Currently the most popular library might be https://github.com/databricks/megablocks. Would be interesting if we can implement it in triton and make it HF comp…
-
### Model Series
Qwen2
### What are the models used?
Qwen2-57B-A14B
### What is the scenario where the problem happened?
train with transformers
### Is this a known issue?
- [X] I have followed…
-
Add support for this incredible model
-
@felipemello1 shared this PR with me https://github.com/vllm-project/vllm/pull/7415
My sense is we should already be able to support this with
```python
from torchao.quantization.quant_api im…
-
## 🐛 Bug
I am trying to work with Jiutian 13.9b MoE model.But getting error in model compilation step.
## To Reproduce
Steps to reproduce the behavior:
1.
pip install --pre -U -f https://…
-
I use this tool for Qwen-MoE DPO, but it stopped training at:
return inner_training_loop(
args=args,
resume_from_checkpoint=resume_from_checkpoint,
trial=trial,
ignore_keys_for_…
-
Can you make Phi 3.5-MoE version available to software such as LM studio ? (by converting to guff or so?)
-
### Feature request
Add support for [microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) which has `PhiMoEForCausalLM` arch.
### Motivation
It fails with the foll…