issues
search
alibaba
/
Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674
stars
94
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add Qwen2 Dense Models Forward Loss Evalution Checking
#259
jerryli1981
closed
3 months ago
1
Add Qwen2 Dense Models Mcore Implementation
#258
jerryli1981
closed
3 months ago
1
Mixtral: AssertionError: (RMSNorm) is not supported in FusedLayerNorm
#257
Treemann
closed
3 months ago
1
qwen2.0用mcore跑的时候,有两个问题
#256
154912369
closed
3 months ago
2
deepseek-v2实现的有问题,不能支持tp>1的情况
#255
154912369
closed
3 months ago
3
Fix DeepSeek-V2 Tensor Parallel MLA Loss Issue
#254
jerryli1981
closed
3 months ago
1
DeepSeek-V2-MoE transformer config adapt to mcore0.6.0
#253
one-game
closed
3 months ago
1
tokenize with qwen2tokenizer in megatron_patch/tokenizer/__init__.py
#252
lclkent
closed
3 months ago
2
[QUESTION] sft阶段数据长短不一,大量padding如何高效训练。
#251
oymzysmwe224
closed
3 months ago
2
支持Qwen2-72B?
#250
Crystalxd
closed
3 months ago
2
请问要在自己的集群上跑pai-megatron框架,环境配置需要自己配哪些东西?有相关步骤可以参考么?
#249
qibao77
closed
3 months ago
2
Phi 系列模型支持
#248
JiwenJ
closed
3 months ago
1
关于二阶段训练的问题
#247
jianhai0527
opened
4 months ago
4
日志中输出的参数总量是不是有问题呢
#246
jianhai0527
closed
4 months ago
1
Update DeepSeek-V2-MoE ReadMe and tensor parallel mla support
#245
jerryli1981
closed
4 months ago
1
Fix issues that example mcore models dont scale query value.
#244
billishyahao
closed
4 months ago
1
[QUESTION]将huggingface 格式的权重转换为megatron格式,对模型评估的准确率是否有影响?
#243
hijeffwu
closed
3 months ago
1
fintune模型
#242
QingqingSun-Bao
closed
3 months ago
2
Qwen-1.5-MoE router strategy alignment with huggingface implementation
#241
jerryli1981
closed
4 months ago
1
Add DeepSeek-V2-MoE Mcore implementation
#240
jerryli1981
closed
4 months ago
1
Update Qwen1.5 MoE ReadMe
#239
jerryli1981
closed
4 months ago
1
Decoupled llama3 and qwen1.5 mcore implementation
#238
jerryli1981
closed
4 months ago
1
llama3-8b 初始loss偏高
#237
EthanChen1234
closed
4 months ago
3
Qwen-32b现在适配了吗
#236
QingqingSun-Bao
closed
4 months ago
1
Support transformer-engine version after 0.9.0
#235
Hsuxu
closed
4 months ago
1
qwen moe模型训练脚本的参数是不是不对?能提供正确的训练脚本吗
#234
jianhai0527
closed
4 months ago
1
使用最新的Megatron代码进行Llama 3检查点转换
#233
shamanez
closed
3 months ago
1
MegaBlocks训练
#232
zhanjiqing
closed
3 months ago
1
Fix finetune mcore qwen bug
#231
jerryli1981
closed
4 months ago
1
Mixtral ggemm to hf format
#230
vlad-karpuhin
opened
4 months ago
2
spelling mistake in example ReadMe
#229
liddk
closed
4 months ago
1
双节点A800-40G训练Qwen1.5-72B模型出错。
#228
Renoshen
closed
3 months ago
1
Add fine-grained moe support for qwen1.5
#227
jerryli1981
closed
4 months ago
1
多机llama3 pt训练报错
#226
bsyonline
closed
4 months ago
7
Llava不支持训到中途失败后基于已保存的checkpoint再次续训的逻辑嘛
#225
liulong11
closed
1 month ago
0
llama3增量预训练时张量维度不匹配
#224
gllary
closed
4 months ago
1
llama3
#223
bsyonline
closed
5 months ago
1
fix bug: when convert megatron to transformer, determine whether val …
#222
changingivan
closed
4 months ago
2
llama3的modeling使用的是qwen1.5
#221
cryoco
closed
5 months ago
2
Add fine-grained moe support for llama3
#220
jerryli1981
closed
5 months ago
1
Add fine-grained moe support for llama3
#219
jerryli1981
closed
5 months ago
1
mixtral checkpoint converter - grouped gemm for experts MLP support
#218
vlad-karpuhin
opened
5 months ago
2
loss increased when tp > 1 for qwen1_5 continue pretrain
#217
changingivan
closed
5 months ago
4
Megatron-Core-MoE
#216
yangzhipeng1108
closed
5 months ago
3
Fix llama3 moe convertor
#215
jerryli1981
closed
5 months ago
1
Fix llama3 moe convertor
#214
jerryli1981
closed
5 months ago
1
我可以用最新的 Megatron 和 Grouped GEMM 训练一个 MOE 模型吗?我尝试了你转换的检查点,但它没有工作,我应该做些不同的事情吗?
#213
shamanez
closed
5 months ago
2
请问有支持command R+的计划么
#212
cryoco
closed
3 months ago
1
Fix llama3 pretrain megatron script
#211
jerryli1981
closed
5 months ago
1
llama3 报错
#210
yangzhipeng1108
closed
5 months ago
2
Previous
Next