Open Lzhang-hub opened 2 months ago
For long seq model train,I want ues both Context Parallel and packing Inputs without cross-contamination attention link , Dose is support?
Context Parallel
packing Inputs without cross-contamination attention
cross-contamination attention like:
Marking as stale. No activity in 60 days.
For long seq model train,I want ues both
Context Parallel
andpacking Inputs without cross-contamination attention
link , Dose is support?cross-contamination attention like: