alibaba Pai-Megatron-Patch issues

alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Apache License 2.0

723 stars 103 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Fix LLaVa mcore loss convergence issue

#385 jerryli1981 closed 1 day ago
1
Add LLaVa mcore implementation

#384 jerryli1981 closed 1 day ago
1
PAI平台上训练llama-3.1-8b时有一个GPU使用率偏低

#383 kkkeepgoing opened 2 days ago
1
关于SFT不同DP训练的loss对不上的问题

#382 yspMing opened 3 days ago
1
关于SFT不同DP训练的结果不同问题

#381 yspMing closed 3 days ago
0
当seq_length=256，qwen2.5=72b，micro_batch_size=2，global_batch_size=64，四台H20报错，当seq_length=2048时，报错消失了

#380 yangzhipeng1108 closed 2 days ago
2
Fix hang issue due to mismatched attention mask

#379 lostkevin closed 4 days ago
0
RuntimeError: DataLoader worker exited unexpectedly

#378 kkkeepgoing opened 1 week ago
1
Fix broadcast CPU tensor issue when TP>1 PP=1 in jsonl dataset SFT

#377 lostkevin closed 1 week ago
0
fix num_seq=None of qwen2

#376 lostkevin closed 2 weeks ago
0
关于qwen1.5-moe的数据格式问题

#375 cdxzyc opened 2 weeks ago
0
最新的commit bf582d8f30d8ffbba51db3dcda984c9f0261d57d 有num_seq None的问题

#374 WuNein closed 2 weeks ago
6
Update utils.py

#373 enze5088 opened 3 weeks ago
1
Qwen2VL support

#372 zhangdun-pat opened 3 weeks ago
0
Add llm auto configurator and apply per seq sft loss for qwen2/2.5 mo…

#371 jerryli1981 closed 3 weeks ago
1
Fix DeepSeekV2Tokenizer bug that padding_side is not set to right

#370 jerryli1981 closed 4 weeks ago
1
最新版本是否支持conversation-sft，多轮对话？

#369 gaohejiao opened 4 weeks ago
0
DeepSeekV2Tokenizer should use padding_side="right" in __init__()!

#368 pqhgit opened 1 month ago
4
Add --use-cpu-initializatio for Qwen2.5 72B HF2TE ckpt conversion

#367 jerryli1981 closed 1 month ago
1
AssertionError: Rank 11: found NaN in local grad norm in backward pass before data-parallel communication collective. Device: 3

#366 lanfengmo opened 1 month ago
3
Fix Qwen2/2.5 te to hf convertion bugs

#365 jerryli1981 closed 1 month ago
1
add extra clone to fix unexpected large ckpt issue

#364 lostkevin closed 1 month ago
1
Possible bug in Mistral MCore <->HF Model conversions because of _extra_state layers

#363 abgoswam closed 4 weeks ago
1
fix convert script to solve empty sample issue

#362 lostkevin closed 1 month ago
0
关于LLAMA 3.1模型的适配问题

#361 echo-valor opened 1 month ago
1
对qwen-2.5扩充词表后loss飙升

#360 QianguoS opened 1 month ago
1
cannot import name 'TEDotProductAttentionMLA' when running `examples/deepseek_v2/run_mcore_deepseek.sh`

#359 dreasysnail opened 1 month ago
4
Fix Qwen2 tie word embeddings issue

#358 jerryli1981 closed 1 month ago
1
No module named 'megatron'

#357 yuanzhiyong1999 closed 1 month ago
0
llama3.1-8b转换megatron-mcore格式后模型大小从15G变成了71G，精度仍然为bf16，这是正常的吗

#356 kkkeepgoing opened 1 month ago
3
Upgrade deepseek-v2-moe models to support MLA via transformer engine …

#355 jerryli1981 closed 1 month ago
1
qwen2.5转换脚本转换时报错

#354 enze5088 opened 1 month ago
1
Fix pretrain with idxmap dataset issue

#353 jerryli1981 closed 1 month ago
1
optimizer offload

#352 leo-ztjht closed 1 month ago
1
加入群聊失败，第二个群也不能扫码加入了

#351 GeorgeSen opened 2 months ago
7
在转换模型的时候就报了一些bug

#350 Yanhong-Li closed 2 months ago
0
update sequence packing and add qwen2.5

#349 lostkevin closed 2 months ago
0
llama3.1 8b训练32k的上下文模型，训练时间长、并且loss偏大

#348 ARQlalala opened 2 months ago
1
llama3.1支持多数据集混合预训练

#347 Bob199511 closed 2 months ago
1
[[: not found Zarr-based strategies will not be registered because of missing packages Traceback (most recent call last)

#346 aJupyter closed 1 month ago
1
有适配minicpm的打算吗？

#345 adol001 opened 2 months ago
0
Implement Sequence Packing in SFT for Qwen2 and LlaMA-3.1 models

#344 lostkevin closed 2 months ago
0
llama7b OOM问题

#343 mxjmtxrm closed 2 months ago
2
建议对deepseek-v2-coder-lite进行sft测试

#342 bao-xiaoyi opened 2 months ago
5
Fix deepseek vocab mismatch

#341 Jiayi-Pan closed 1 month ago
1
avoid max(sentence_ids) ValueError: max() arg is an empty sequence

#340 village-way closed 4 weeks ago
0
有适配qwen2-vl的打算吗？

#339 divisionblur opened 2 months ago
2
DeepSeek Vocab-size Mismatch

#338 Jiayi-Pan opened 2 months ago
1
feature: support for qwen2Toknizer data preprocess and pretraining

#337 village-way closed 2 months ago
0
fix: qwen1.5 run_pretrain_megatron_qwen.sh tokenizer error

#336 village-way closed 2 months ago
1