Closed satyajitghana closed 1 year ago
Hello @satyajitghana,
Could you validate which versions of the software you have installed by posting the output pip list
?
sorry i didn't try after that. I think i had tried the 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuron:1.11.0-neuron-py38-sdk2.4.0-ubuntu20.04 image as well, and it was the same error.
Hi @satyajitghana,
I was able to reproduce your error. You can try with this modification (devices=[1]):
# Init DataModule
dm = MNISTDataModule()
# Init model from datamodule's attributes
model = LitModel(*dm.dims, dm.num_classes)
# Init trainer
trainer = Trainer(
max_epochs=3,
callbacks=[TQDMProgressBar(refresh_rate=20)],
accelerator="tpu",
devices=[1],
)
which may get you a little further. Ideally we'll also diagnose why this change is needed. I eventually hit a compilation error which we will need to investigate separately. However your real use case may not encounter this problem if you are working with a different model.
Since you seem to have given up for now I'll plan to close this issue unless you are still actively pursuing using lightning.
Closing inactive ticket . Please reopen if still needed.
Hi,
I am trying to add support for Trn1 training to PyTorch Lightning, in theory, it should have worked out of the box since PL supports TPU training. there are a few changes that need to be made. But now i am facing the below error.
The code is from https://pytorch-lightning.readthedocs.io/en/stable/notebooks/lightning_examples/mnist-tpu-training.html and I'm trying to use single Trn1 core.