NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
10.69k stars 2.39k forks source link

Dose Context Parallel support Packing Inputs Without Cross-Contamination Attention? #1131

Open Lzhang-hub opened 2 months ago

Lzhang-hub commented 2 months ago

For long seq model train,I want ues both Context Parallel and packing Inputs without cross-contamination attention link , Dose is support?

cross-contamination attention like: Image

github-actions[bot] commented 2 weeks ago

Marking as stale. No activity in 60 days.