RulinShao / LightSeq

Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
179 stars 8 forks source link

does it support mixtral_8x7b model? #6

Open strngelet opened 6 months ago

strngelet commented 6 months ago

awesome work!

I am curious if lightseq also supports MoE architecture? eg: mixtral_8x7b.

thanks in advance.