Closed hill2hill closed 4 months ago
When avoiding the use of LoRA for multi-GPU training to obtain the state_dict, if tensors are distributed across multiple GPUs, it can lead to a situation where they cannot be retrieved, causing the process to stall.
just follow the style in train.py
Thanks for your fixing.
When avoiding the use of LoRA for multi-GPU training to obtain the state_dict, if tensors are distributed across multiple GPUs, it can lead to a situation where they cannot be retrieved, causing the process to stall.
just follow the style in train.py