dvmazur mixtral-offloading issues

dvmazur / mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

MIT License

2.29k stars 227 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Hard to benchmark the operation in the repo

#39 mynotwo opened 3 months ago
1
Mixtral Instruct tokenizer from Colab notebook doesn't work.

#38 jmuntaner-smd opened 4 months ago
2
Change of query weight matrices shapes

#37 avani17101 closed 6 months ago
0
Support DeepSeek V2 model

#36 Minami-su opened 6 months ago
0
Having issue loading my HQQ quantized model

#35 BeichenHuang opened 7 months ago
0
How to split the model parameter safetensors file into multiple small files

#34 YLSnowy opened 7 months ago
5
Implementation of benchmarks (C4 perplexity, Wikitext perplexity)

#33 ChengSashankh opened 7 months ago
0
runtimeerror when nbit = 4 and group_size =64

#32 Eutenacity opened 7 months ago
0
Trition Issues in Running the Code Locally

#31 amangupt01 closed 7 months ago
1
Can this be used for Jambo inference

#30 freQuensy23-coder opened 7 months ago
1
FastAPI Integration and Performance Benchmarking

#29 Jnmz opened 7 months ago
0
(Colab) Clear GPU RAM usage after running the generation code without restarting instance

#28 ScottLinnn closed 7 months ago
1
Update build_model.py

#27 fire717 opened 8 months ago
0
a strange issue with default parameters " RuntimeError about memory"

#26 a1564740774 opened 8 months ago
0
Update Requirements.txt

#25 Soumadip-Saha opened 9 months ago
0
Run on second GPU (torch.device("cuda:1"))

#24 imabot2 opened 9 months ago
1
4bit-3bit model produces gibberish when plugged into demo

#23 jjblum opened 10 months ago
0
Run without quantization

#22 freQuensy23-coder opened 10 months ago
9
hqq_aten package not installed.

#21 LeMoussel closed 10 months ago
2
Update typo in README.md

#20 kaushalpowar opened 10 months ago
0
need mixtral offload for NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO

#19 githubpradeep opened 10 months ago
0
CUDA OOM errors in wsl2

#18 MrNova111 opened 10 months ago
0
Is it possible to finetune this on a custom dataset?

#17 asmith26 opened 10 months ago
8
Can it run with LlamaIndex?

#16 LeMoussel opened 10 months ago
0
Can it run on multi-GPU?

#15 drdh opened 11 months ago
10
Doesn't work

#14 SanskarX10 closed 11 months ago
11
How to use the offloading in my MoE model?

#13 WangRongsheng closed 11 months ago
4
CLI interface added

#12 NJannasch opened 11 months ago
3
Mixtral OffLoading/GGUF/ExLlamaV2, which approach to use?

#11 LeMoussel opened 11 months ago
1
Enhancing the Efficacy of MoE Offloading with Speculative Prefetching Strategies

#10 yihong1120 opened 11 months ago
1
Utilized pop for meta keys cleanup

#9 vivekmaru36 closed 10 months ago
1
Update README.md

#8 eltociear closed 11 months ago
1
Session crashed on colab

#7 bitsnaps closed 11 months ago
4
Revert "Some refactoring"

#6 lavawolfiee closed 11 months ago
0
Some refactoring

#5 lavawolfiee closed 11 months ago
0
exl2

#4 eramax opened 11 months ago
2
Refactor

#3 dvmazur closed 11 months ago
0
adding requirements.txt

#2 h9-tect opened 11 months ago
1
Fix colab

#1 dvmazur closed 11 months ago
0