issues
search
alibaba
/
Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674
stars
94
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
cannot import name 'TEDotProductAttentionMLA' when running `examples/deepseek_v2/run_mcore_deepseek.sh`
#359
dreasysnail
opened
23 hours ago
1
Fix Qwen2 tie word embeddings issue
#358
jerryli1981
closed
6 days ago
1
No module named 'megatron'
#357
yuanzhiyong1999
closed
6 days ago
0
llama3.1-8b转换megatron-mcore格式后模型大小从15G变成了71G,精度仍然为bf16,这是正常的吗
#356
kkkeepgoing
opened
1 week ago
2
Upgrade deepseek-v2-moe models to support MLA via transformer engine …
#355
jerryli1981
closed
1 week ago
1
qwen2.5转换脚本转换时报错
#354
enze5088
opened
1 week ago
1
Fix pretrain with idxmap dataset issue
#353
jerryli1981
closed
1 week ago
1
optimizer offload
#352
leo-ztjht
closed
1 week ago
1
加入群聊失败, 第二个群也不能扫码加入了
#351
GeorgeSen
opened
1 week ago
7
在转换模型的时候就报了一些bug
#350
Yanhong-Li
closed
1 week ago
0
update sequence packing and add qwen2.5
#349
lostkevin
closed
1 week ago
0
llama3.1 8b训练32k的上下文模型,训练时间长、并且loss偏大
#348
ARQlalala
opened
2 weeks ago
0
llama3.1支持多数据集混合预训练
#347
Bob199511
closed
2 weeks ago
1
[[: not found Zarr-based strategies will not be registered because of missing packages Traceback (most recent call last)
#346
aJupyter
closed
1 week ago
1
有适配minicpm的打算吗?
#345
adol001
opened
3 weeks ago
0
Implement Sequence Packing in SFT for Qwen2 and LlaMA-3.1 models
#344
lostkevin
closed
3 weeks ago
0
llama7b OOM问题
#343
mxjmtxrm
closed
3 weeks ago
2
建议对deepseek-v2-coder-lite进行sft测试
#342
bao-xiaoyi
closed
1 week ago
2
Fix deepseek vocab mismatch
#341
Jiayi-Pan
opened
3 weeks ago
1
avoid max(sentence_ids) ValueError: max() arg is an empty sequence
#340
village-way
opened
3 weeks ago
0
有适配qwen2-vl的打算吗?
#339
divisionblur
opened
3 weeks ago
1
DeepSeek Vocab-size Mismatch
#338
Jiayi-Pan
opened
3 weeks ago
1
feature: support for qwen2Toknizer data preprocess and pretraining
#337
village-way
closed
3 weeks ago
0
fix: qwen1.5 run_pretrain_megatron_qwen.sh tokenizer error
#336
village-way
closed
3 weeks ago
1
fix/rm eos_token_id mask
#335
wakafengfan
closed
3 weeks ago
0
fix qwen moe model convert
#334
lee0ray
closed
1 month ago
1
关于llava适配的问题
#333
divisionblur
closed
1 month ago
2
AssertionError: First dimension of the tensor should be divisible by tensor parallel size
#332
pizts
opened
1 month ago
0
fix precision issues in convert script (https://github.com/alibaba/Pa…
#331
aeeeeeep
closed
1 month ago
2
llava run error
#330
yangzhipeng1108
opened
1 month ago
2
enhance auto offload and add MoE offload support
#329
lostkevin
closed
1 month ago
0
Upgrade qwen2 dense and moe models to support FA3, Offloading and Overlapping
#328
jerryli1981
closed
1 month ago
1
deepseek模型转换问题
#327
bao-xiaoyi
opened
1 month ago
7
Implement LLaMA3.1 in Pai-Megatron-Patch
#326
lostkevin
closed
1 month ago
0
qwen2-sft 训练起步阶段就卡住
#325
baisechundu
opened
1 month ago
4
TypeError: get_cpu_offload_context() missing 1 required positional argument: 'weight_offloading'
#324
ben-8878
closed
1 month ago
2
使用flash-attn训练Qwen1.5 1.8B 加速效果不明显
#323
coder-wangzhen
closed
1 month ago
1
mcore 权重转换不支持pp>1
#322
xs1997zju
closed
1 month ago
2
安装pyarrow失败
#321
xiaoquanWu
closed
1 month ago
1
mmap数据格式问题
#320
bao-xiaoyi
closed
1 month ago
1
关于使用idxmap格式finetune qwen2
#319
Gloid59
closed
1 month ago
2
断点续训问题
#318
divisionblur
closed
1 month ago
1
Fix llama3 megatron version scripts
#317
jerryli1981
closed
1 month ago
1
Channel Loss支持
#316
echo-valor
opened
1 month ago
1
保存的checkpoints中缺少distrib_optim.pt
#315
shizikachen
closed
1 month ago
0
starcoder依赖哪个版本的megatron-lm?
#314
bao-xiaoyi
closed
1 month ago
3
Update readme in order to add new dingtalk group for tech discussion
#313
jerryli1981
closed
2 months ago
1
Mcore是不支持pp吗?
#312
divisionblur
closed
1 month ago
3
optimizer offloading 太强了
#311
154912369
opened
2 months ago
1
群满了
#310
zgf1005
closed
1 week ago
3
Next