Any plan to incorporate tensor parallelism or zero data parallelism?

CoinCheung / gdGPT

Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

Apache License 2.0

91 stars 8 forks source link

Any plan to incorporate tensor parallelism or zero data parallelism? #18

Open GeneZC opened 1 year ago

GeneZC commented 1 year ago

Would it be possible in this framework that the pipeline is incorporated to tensor parallelism or zero data parallelism?

CoinCheung commented 1 year ago

Hi,

Thanks for being interested in this repo !!!

Is there any experiments post that adding tensor parallelism or zero would improve training performance ?

GeneZC commented 1 year ago

Not really.

However, there are projects that use pipeline with tensor parallelism together for efficiency such like megatron. And I believe this project offers a better solution since it only depends on deepspeed without heavy dependencies as in megatron.

As for pipeline with zero, I have not seen any other projects did this.