BlackSamorez / tensor_parallel

Automatically split your PyTorch models on multiple GPUs for training & inference
MIT License
629 stars 39 forks source link

Does tensor_parallel support multi-node tensor parallel training? #84

Open liguodongiot opened 1 year ago

zhangjunyi111 commented 1 year ago

I want to konw too.

longday1102 commented 1 year ago

@BlackSamorez Hope you can answer this question 😄😄

longday1102 commented 1 year ago

@BlackSamorez I have 2 servers with a total of 16 GPUs, so I would love to be able to use multi-nodes tensor-parallel to train a large language model, for example Bloom 176B. So I hope you can answer how to use multi-nodes tensor-parallel. Thank you very much

PieterZanders commented 6 months ago

Is this solved? if so, how?

deema-A commented 3 months ago

same question.

Tezcan98 commented 1 month ago

ahaha everybody have same problem but I think there is no feature like this but we absolutely need it. Recently I tried DeepSpeed which is developing from microsoft, maybe it has but Microsoft's code doesn't suppport Windows 😄