-
I'm trying a custom LSTM architecture but seem to be getting the following error when tryin to train, I'm not sure where to start on debugging this.
Pytorch: 1.6
Pytorch lightning: latest
Xla 1…
-
## ❓ Questions and Help
Thanks for the great package!
I get the following error when trying to train Faster RCNN on TPU. The code works for GPU.
I'm providing the link to the Colab.
https:/…
-
In 670ff08e77c5b331443a3ed6d41564d863a47f06 when running:
```
TSAN_OPTIONS=second_deadlock_stack=1 ./util/opensslwrap.sh genpkey -genparam -algorithm DH -pkeyopt gindex:1 -pkeyopt type:fips186_4 -te…
-
## ❓ Questions and Help
Hello, I was trying to put an EfficientNet-B1 with image size 1536X1536 and batch size 2 to the TPU using spawn (x8). But it always gave me OOM issue.
I reduced image size t…
-
## 🐛 Bug
A comparison is made between torch.FloatTensor and XLA tensor in `pytorch_lightning/callbacks/early_stopping.py`
#### Code sample
```
Exception in device=TPU:2: torch_…
-
## 🐛 Bug
When I try to train a model on Kaggle TPU's with `num_tpu_cores` set to 8, I receive an error `Exception: process 2 terminated with exit code 1` . Would be great if this worked on kaggle.
…
-
#83 created a new app crashing bug on certain browsers
- Chrome Mobile
- Safari (desktop and mobile)
-
## ❓ Questions and Help
I wondered, is there some code to trap an interrupt signal to do proper cleanup in a multi-processing context? Right now, I don't have any, and I frequently need to do Ctrl…
-
## 🐛 Bug
DDP breaks LR finder
### To Reproduce
```
finder = trainer.lr_find(model)
print(finder.suggestion())
```
```
Traceback (most recent call last):
File "./training.py", line 1…
s-rog updated
4 years ago
-
I'm trying to run a LSTM model on TPU with colab. It throws me following error.
```
Exception in device=TPU:1: Aborted: Session 0275bc9f6430801b is not found.
Exception in device=TPU:3: Aborted: Se…