NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.42k stars 1.4k forks source link

Apex Tensor Parallelism and LoRA #1712

Closed conceptofmind closed 4 months ago

conceptofmind commented 1 year ago

Hello,

I was wondering if you knew whether Apex's Tensor Parallelism was compatible with LoRA? Would tensor parallelism work for both the base model and lora?

I appreciate your time and consideration.

Thank you,

Enrico

### Tasks
rajveer43 commented 1 year ago

Apex's Tensor Parallelism is not compatible with LoRA. LoRA is a library for training large language models on GPUs, while Apex is a library for extending PyTorch with new features, such as tensor parallelism. Tensor parallelism is a technique for training models on multiple GPUs by dividing the model's parameters and computations across the GPUs. However, LoRA does not support tensor parallelism, so Apex's tensor parallelism cannot be used with LoRA.

I am not a contributor here, but sharing this as per my knowledge.

conceptofmind commented 1 year ago

Apex's Tensor Parallelism is not compatible with LoRA. LoRA is a library for training large language models on GPUs, while Apex is a library for extending PyTorch with new features, such as tensor parallelism. Tensor parallelism is a technique for training models on multiple GPUs by dividing the model's parameters and computations across the GPUs. However, LoRA does not support tensor parallelism, so Apex's tensor parallelism cannot be used with LoRA.

I am not a contributor here, but sharing this as per my knowledge.

This looks like it was written by chatgpt. It is also very unlikely to be correct.

rajveer43 commented 1 year ago

Apex's Tensor Parallelism is not compatible with LoRA. LoRA is a library for training large language models on GPUs, while Apex is a library for extending PyTorch with new features, such as tensor parallelism. Tensor parallelism is a technique for training models on multiple GPUs by dividing the model's parameters and computations across the GPUs. However, LoRA does not support tensor parallelism, so Apex's tensor parallelism cannot be used with LoRA. I am not a contributor here, but sharing this as per my knowledge.

This looks like it was written by chatgpt. It is also very unlikely to be correct.

Yes it is written by AI..but not chatgpt, I build a small module using llama 2 to understand the working of it. any LLM is not accurate on providing information. I am just testing it with people and their reviews by providing it. I am still researching about these things. I also know a very little about Apex's Tensor parallelism. I tried making a agent which could asnwer coding questions efficiently. its I acheived the accuracy around 60% which is quite low for it.

vince62s commented 1 year ago

Enrico, I am facing something similar while trying to implement Lora along with tensor parallelism. Here are my thoughts:

Say we split K (same reasoning for Q V or FF) of dimension (4096, 4096) in two chunks K1 (2049, 4096) and K2 (2048, 4096) Let's call A1, B1 the LoRa parameters of the submodel corresponding to K1 and A2, B2 for the other submodel. In the end we need K1 = K1 + B1 @ A1 and K2 = K2 + B2 @ A2 given that B1 is dim (2048, r=8) and A1 (r=9, 4096) [ same for B2 and A2] if we don't want to merge lora weights at each saving we need to figure out A and B from A1, A2, B1, B2 I don't think there is an easy way to accomplish this because each Ai is independently trained.