Error while saving the fine-tuned model

Chinjuj2017 commented 2 months ago

Hi, when I am trying to save the Prot T5 model after fine-tuning the with my data set I am getting error like this " TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object ", may I know how to resolve this. PS. I have followed your notebook on LoRA fine tuning_per_prot Thanks in advance

RSchmirler commented 2 months ago

Hi @Chinjuj2017 , without any shared code it is hard to fix. Do you run a multi GPU setup? Perhaps you can detach the params before saving, let me know if this works.

def save_model(model,filepath):
# Saves all parameters that were changed during finetuning

    # Create a dictionary to hold the non-frozen parameters
    non_frozen_params = {}

    # Iterate through all the model parameters
    for param_name, param in model.named_parameters():
        # If the parameter has requires_grad=True, add it to the dictionary
        if param.requires_grad:
            non_frozen_params[param_name] = param.detach().cpu().clone()

    # Save only the finetuned parameters 
    torch.save(non_frozen_params, filepath)

Chinjuj2017 commented 2 months ago

Hi @RSchmirler , Thank you for the response. Sorry, I didn't share any code; I have used a multi-GPU setup. I will try your solution and get back to you.

agemagician / ProtTrans

Error while saving the fine-tuned model #155