Closed piraka9011 closed 3 years ago
I'm not too familiar with the internals of PTL, but I'm guessing the solution is to determine whether there are GPUs or TPUs and not just GPUs.
We don't support TPU training, and it's not tested. If you get it working, please let us know.
looks like you can try disabling learning rate scheduler. but, yes, training on TPU's isn't something we maintain - in G's Colab you should be able to just use GPUs
Describe the bug
Training any model does not work with TPUs due to an error with the way
modelPT.py
calculatesoptim_config['sched']['t_num_workers']
here.When you have 0 GPUs, but are still using TPUs,
t_num_workers
is 0.This causes a division by zero error here.
Steps/Code to reproduce bug
Follow any colab with TPU support and set
trainer.tpu_cores=8
in the config.Expected behavior
Training works.
Environment overview (please complete the following information)
pip install nemo_toolkit['all']==1.0.0rc1
Environment details
Otherwise, please provide:
Additional context
Installed PyTorch XLA using: