-
### Description
Hi there, getting a strange and uninformative error when running on TPU that I don't get when running locally, was hoping someone here could explain the error :thinking:
```
jaxl…
-
Hi, thank you for your awesome work!
I would like to finetune the 350M model, using tpu remote mode on a single tpu v3-8 node. I used the config file https://github.com/salesforce/jaxformer/blob/m…
-
Could you please tell me; can I train a model on TPU's?
-
Hi, thanks for the effort you have put into this project,
I've been using the script but it takes the cpu usage to 100% and takes quite some time to process videos lasting a couple of hours,
Are the…
-
initial experiments show that modulo some smalllish fixes pytorch XLA could work
```
import pyhf
import torch
import torch_xla
import torch_xla.core.xla_model as xm
spec = {
'channels':…
-
Hi,
I am trying to fine tune BERT using TPU on my own dataset. To fine tune BERT I wrote the following code:
`def create_model(is_training, input_ids, input_mask, segment_ids, labels,
…
-
### Affected Resource(s)
* google_tpu_node
Failure rate:
- TestAccTPUNode_tpuNodeBasicExample: 50% in Sept 2021 (15 failures)
- TestAccTPUNode_tpuNodeBasicExample: 80% in March 2022…
-
## ❓ Questions and Help
When running on vp-128 TPU pod (even when sharding only by batch dimension) we are experiencing very low performance comparing to the same pod without SPMD.
Do you have any…
-
I am fine-tuning the 345M model with TPU on Google Colab. I'm getting ~ 0.1 it/s, whereas the original shepperd fork reaches ~ 1 it/s for the same model and parameters on a T4 GPU. I would have expect…
-
**Documentation**
[MLIR Language Reference](https://mlir.llvm.org/docs/LangRef/)
[MLIR Bytecode Format](https://mlir.llvm.org/docs/BytecodeFormat/)
**Examples:**
[1043.mlir.zip](https://github.com/lu…