NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.83k stars 2.46k forks source link

AttributeError: Can't pickle local object 'FilterbankFeatures.__init__.<locals>.<lambda>' #1194

Closed amejri closed 4 years ago

amejri commented 4 years ago

Hi guys, I am trying to train an ASR with nemo. I preprocessed data as shown on the tutorial. but when I tried to train the model, I have the following error : AttributeError: Can't pickle local object 'FilterbankFeatures.init..'. Can someone help me ? Thanks in advance.

okuchaiev commented 4 years ago

can you please provide more details? (As in bug template).

But this error can occur if you are using non-ddp backend when training on multi-GPU

amejri commented 4 years ago

You are right. The code works fine with a single GPU. How do I put it on multi-GPU ?

niloofarmaani1 commented 3 years ago

Hi I have fine tuned the nemo model, and I am trying to save the model using torch.save(quartznet,'fine_tuned') but I end up with the same error, do you have any idea about it?

amejri commented 3 years ago

Hello, to save the model, I have just used this command : quartznet.save_to(path)

hoangtuanvu commented 3 years ago

Hi @amejri, how do you fix error when training on multi-gpus?

SummerZ723 commented 3 years ago

I also met this problem. Through check out the docs of pytorch lightning, I found that this error could be solved by adding a parameter " accelerator='ddp' " to trainer. Therefore, the code works fine with multi-GPU since the doc explains that "('ddp') is DistributedDataParallel (each gpu on each node trains, and syncs grads)" https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html

hzitoun commented 3 years ago

I also met this problem. Through check out the docs of pytorch lightning, I found that this error could be solved by adding a parameter " accelerator='ddp' " to trainer. Therefore, the code works fine with multi-GPU since the doc explains that "('ddp') is DistributedDataParallel (each gpu on each node trains, and syncs grads)" https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html

That is it!! thanks

rajeevbaalwan commented 1 year ago

I also met this problem. Through check out the docs of pytorch lightning, I found that this error could be solved by adding a parameter " accelerator='ddp' " to trainer. Therefore, the code works fine with multi-GPU since the doc explains that "('ddp') is DistributedDataParallel (each gpu on each node trains, and syncs grads)" https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html

As per pytorch lightning docs there is no option for 'ddp'. https://pytorch-lightning.readthedocs.io/en/latest/accelerators/gpu_basic.html Also I have tried by providing accelerator="gpu" and devices=4 but getting same error.

titu1994 commented 1 year ago

Ptl added another argument called trainer.strategy="ddp". All of these flags are set in our configs please refer to them

OrjwanZaafarani commented 1 year ago

I'm facing the same error with FastPitch TTS, any help? nemo image: 22.12