issues
search
deepseek-ai
/
DeepSeek-MoE
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
MIT License
982
stars
48
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
用来训练deepseek-v2-coder-lite-instruction会挂起
#41
bao-xiaoyi
opened
1 month ago
0
Finetune patch
#40
Muhtasham
opened
1 month ago
1
训练MoE的时候会出现loss = 0 的情况
#39
AlenjandroWang
opened
2 months ago
0
专家并行是怎么配置的? 有配置代码吗
#38
ninglonglong
opened
2 months ago
0
why <|EOT|> ?
#37
BING-LLL
opened
3 months ago
0
Close expert parallel in vllm
#36
trebladev
closed
3 months ago
0
Finetune with deepspeed: type mismatch
#35
YeZiyi1998
opened
4 months ago
3
单卡A100-80G推理速度慢
#34
Dreaming-world
opened
4 months ago
0
No need to add epsilon 1e-20 in topk norm?
#33
MARD1NO
closed
6 months ago
0
能添加modelscope链接吗,这样可以更方便一些不能连hg的情况
#32
lll143653
opened
7 months ago
0
MOE 并行怎么实现的?
#31
YunxinLi
opened
7 months ago
1
模型结果测评复现
#30
JustQJ
opened
7 months ago
1
About expert capacities: Is there token-dropping during training?
#29
Spico197
closed
6 months ago
3
finetune后的模型输出异常
#28
JustQJ
closed
7 months ago
4
deepseek-moe模型在进行lora微调训练时loss值会突然变为0一直到最后,导致推理异常。
#27
hangchen426926
opened
7 months ago
3
请问现在支持在NPU设备上进行微调吗
#26
Tyx-main
closed
7 months ago
1
您们好请问准备开源的moe-145b什么时候准备上传呢?
#25
win10ogod
opened
7 months ago
3
load erros
#24
cooper12121
closed
7 months ago
2
Can you provide the inference version of DeepSeek based on vllm, deepspeed and tensorrt-llm
#23
Eutenacity
closed
7 months ago
1
非常棒的工作,有没有微信沟通群呢
#22
dawson-chen
closed
8 months ago
1
您好,能否提供量化的方案
#21
edisonzf2020
opened
8 months ago
2
关于flash_attn
#20
GXKIM
closed
8 months ago
1
flash atten
#19
GXKIM
closed
8 months ago
0
Selective precision In gate and norm may conflict with deepspeed?
#18
drxmy
closed
8 months ago
1
Question about AddAuxiliaryLoss?
#17
KaiWU5
closed
8 months ago
1
您们会开源DeepSeekMoE 2B模型吗?
#16
win10ogod
opened
8 months ago
6
您们有计划支持llama.cpp这个项目吗
#15
hqu-little-boy
opened
8 months ago
1
The released DeepSeekMoE 16B Base has 3 different vocab size
#14
drxmy
closed
8 months ago
2
deepseek-moe-16b inference speed is slower than Baichuan-13b
#13
ifromeast
closed
8 months ago
3
How to fully finetune MoE on multiple nodes
#12
ftgreat
closed
9 months ago
1
Will it compare performance with llama-moe?
#11
ccccj
closed
8 months ago
1
finetune 过程出错
#10
ifromeast
closed
9 months ago
1
Update README.md
#9
eltociear
opened
9 months ago
0
#feature request# DeepSeek-Moe for code
#8
Xingxiangrui
closed
8 months ago
1
您们能够开源复现模型架构的训练项目吗?
#7
win10ogod
closed
8 months ago
3
开源的MoE模型支持中文吗?
#6
uloveqian2021
closed
8 months ago
4
GPU utils is low compared with dense model
#5
charliedream1
closed
8 months ago
4
CUDA error: device-side assert triggered when trying to run the model
#4
intervitens
closed
9 months ago
2
求助:模型无法加载
#3
KMnO4-zx
closed
9 months ago
4
inference tools like vllm can support?
#2
zhang001122
closed
8 months ago
3
Finetune patch
#1
zwd003
closed
9 months ago
0