-
I'm running flux in google colab, something using the original fp16 base model and sometimes others either fp variation or different tuning, either way though, everytime i go to generate an image now,…
-
When training on large models (> 10B or so), we found the checkpointing sometimes got stuck when saving the checkpoints.
For instance, it may work smoothly for saving checkpoints a few times and then…
-
### The problem
Seems like TPU utilization is not effective. The CPU load in Google Cloud console is under 7% when fine tuning:
![image](https://user-images.githubusercontent.com/4006428/6909968…
-
### Community Note
* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help t…
-
There are [reports](https://www.kaggle.com/c/ranzcr-clip-catheter-line-classification/discussion/218460) of this library not working when running on a TPU.
-
(also posted to MineDojo: https://github.com/MineDojo/MineDojo/issues/85, but cross posting here since it says it's a Malmo issue)
I am trying to run MineDojo on a Google Cloud Platform TPU VM, but…
-
Are there instructions specific to creating a bmodel from onnx for Llama 3.1 (not lllam3)
Running this is erroring out.
python export_onnx.py --model_path ../../../../Meta-Llama-3.1-8B-Instruct/ -…
-
## ❓ Questions and Help
I have pretrained roberta-base on dna promoter sequences of plants (working on a project). I am currently trying to finetune it on a downstream task of predicting gene express…
-
### Description
Hello
I'm running a TPU v3-8 VM on Google. On the VM I installed jax with `pip install "jax[tpu]==0.2.16" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html`.
U…
-
## 🐛 Bug
## To Reproduce
Here are two scripts for the experiment
test1.py
```
import torch
import torch_xla.core.xla_model as xm
import math
random_k = torch.randn((100, 100), dtype=…