Open zepmck opened 1 year ago
Is there anyone who managed to launch the finetuning code? I am using the guanaco finetuning 7b sample script but still getting the above error. Any help would be really appreciated. Tnx.
I haven't seen that error before, I would suggest using one GPU for debugging as this might be related to DDP. One A100 should easily fit the 7B model. https://discuss.pytorch.org/t/ddp-and-gradient-checkpointing/132244
Hi all, every time I try to launch the code for fine-tuning on a DGX A100 system (8 GPUs) either in serial or in parallel I get the following error. Any suggestion how to fix it?