issues
search
alibaba
/
Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
723
stars
103
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fix LLaVa mcore loss convergence issue
#385
jerryli1981
closed
1 day ago
1
Add LLaVa mcore implementation
#384
jerryli1981
closed
1 day ago
1
PAI平台上训练llama-3.1-8b时有一个GPU使用率偏低
#383
kkkeepgoing
opened
2 days ago
1
关于SFT不同DP训练的loss对不上的问题
#382
yspMing
opened
3 days ago
1
关于SFT不同DP训练的结果不同问题
#381
yspMing
closed
3 days ago
0
当seq_length=256,qwen2.5=72b,micro_batch_size=2,global_batch_size=64,四台H20报错,当seq_length=2048时,报错消失了
#380
yangzhipeng1108
closed
2 days ago
2
Fix hang issue due to mismatched attention mask
#379
lostkevin
closed
4 days ago
0
RuntimeError: DataLoader worker exited unexpectedly
#378
kkkeepgoing
opened
1 week ago
1
Fix broadcast CPU tensor issue when TP>1 PP=1 in jsonl dataset SFT
#377
lostkevin
closed
1 week ago
0
fix num_seq=None of qwen2
#376
lostkevin
closed
2 weeks ago
0
关于qwen1.5-moe的数据格式问题
#375
cdxzyc
opened
2 weeks ago
0
最新的commit bf582d8f30d8ffbba51db3dcda984c9f0261d57d 有num_seq None的问题
#374
WuNein
closed
2 weeks ago
6
Update utils.py
#373
enze5088
opened
3 weeks ago
1
Qwen2VL support
#372
zhangdun-pat
opened
3 weeks ago
0
Add llm auto configurator and apply per seq sft loss for qwen2/2.5 mo…
#371
jerryli1981
closed
3 weeks ago
1
Fix DeepSeekV2Tokenizer bug that padding_side is not set to right
#370
jerryli1981
closed
4 weeks ago
1
最新版本是否支持conversation-sft,多轮对话?
#369
gaohejiao
opened
4 weeks ago
0
DeepSeekV2Tokenizer should use padding_side="right" in __init__()!
#368
pqhgit
opened
1 month ago
4
Add --use-cpu-initializatio for Qwen2.5 72B HF2TE ckpt conversion
#367
jerryli1981
closed
1 month ago
1
AssertionError: Rank 11: found NaN in local grad norm in backward pass before data-parallel communication collective. Device: 3
#366
lanfengmo
opened
1 month ago
3
Fix Qwen2/2.5 te to hf convertion bugs
#365
jerryli1981
closed
1 month ago
1
add extra clone to fix unexpected large ckpt issue
#364
lostkevin
closed
1 month ago
1
Possible bug in Mistral MCore <->HF Model conversions because of _extra_state layers
#363
abgoswam
closed
4 weeks ago
1
fix convert script to solve empty sample issue
#362
lostkevin
closed
1 month ago
0
关于LLAMA 3.1模型的适配问题
#361
echo-valor
opened
1 month ago
1
对qwen-2.5扩充词表后loss飙升
#360
QianguoS
opened
1 month ago
1
cannot import name 'TEDotProductAttentionMLA' when running `examples/deepseek_v2/run_mcore_deepseek.sh`
#359
dreasysnail
opened
1 month ago
4
Fix Qwen2 tie word embeddings issue
#358
jerryli1981
closed
1 month ago
1
No module named 'megatron'
#357
yuanzhiyong1999
closed
1 month ago
0
llama3.1-8b转换megatron-mcore格式后模型大小从15G变成了71G,精度仍然为bf16,这是正常的吗
#356
kkkeepgoing
opened
1 month ago
3
Upgrade deepseek-v2-moe models to support MLA via transformer engine …
#355
jerryli1981
closed
1 month ago
1
qwen2.5转换脚本转换时报错
#354
enze5088
opened
1 month ago
1
Fix pretrain with idxmap dataset issue
#353
jerryli1981
closed
1 month ago
1
optimizer offload
#352
leo-ztjht
closed
1 month ago
1
加入群聊失败, 第二个群也不能扫码加入了
#351
GeorgeSen
opened
2 months ago
7
在转换模型的时候就报了一些bug
#350
Yanhong-Li
closed
2 months ago
0
update sequence packing and add qwen2.5
#349
lostkevin
closed
2 months ago
0
llama3.1 8b训练32k的上下文模型,训练时间长、并且loss偏大
#348
ARQlalala
opened
2 months ago
1
llama3.1支持多数据集混合预训练
#347
Bob199511
closed
2 months ago
1
[[: not found Zarr-based strategies will not be registered because of missing packages Traceback (most recent call last)
#346
aJupyter
closed
1 month ago
1
有适配minicpm的打算吗?
#345
adol001
opened
2 months ago
0
Implement Sequence Packing in SFT for Qwen2 and LlaMA-3.1 models
#344
lostkevin
closed
2 months ago
0
llama7b OOM问题
#343
mxjmtxrm
closed
2 months ago
2
建议对deepseek-v2-coder-lite进行sft测试
#342
bao-xiaoyi
opened
2 months ago
5
Fix deepseek vocab mismatch
#341
Jiayi-Pan
closed
1 month ago
1
avoid max(sentence_ids) ValueError: max() arg is an empty sequence
#340
village-way
closed
4 weeks ago
0
有适配qwen2-vl的打算吗?
#339
divisionblur
opened
2 months ago
2
DeepSeek Vocab-size Mismatch
#338
Jiayi-Pan
opened
2 months ago
1
feature: support for qwen2Toknizer data preprocess and pretraining
#337
village-way
closed
2 months ago
0
fix: qwen1.5 run_pretrain_megatron_qwen.sh tokenizer error
#336
village-way
closed
2 months ago
1
Next