-
Hi.
I tried madlad400, but there is a problem with the output if it is float16
```
$ python convert.py --model google/madlad400-3b-mt
$ python t5.py --model google/madlad400-3b-mt --prompt "A ta…
-
### 🐛 Describe the bug
I used fsdp+ShardedGradScaler to train my model. Compared with apex. amp+ddp, the precision of my model has decreased.
The ddp is like
```
model, optimizer = amp.initial…
-
Logs showing a failure to configure the toolchain using your automatic discovery code:
```
...
File "D:\appmana\.venv\Lib\site-packages\torch\_dynamo\output_graph.py", line 1465, in _call_user_…
-
### 🐛 Describe the bug
Note :
I know that bfloat16 should obviously not be used on a CPU model.
Maybe it's a better practice to do `to(self.device).to(bfloat16)` than `.to(bfloat16).to(self.devi…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
- `llamafactory` version: 0.9.1.dev0
- Platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.35
- Python …
-
This is my scripts:
```
torchrun --nnodes 1 \
--nproc_per_node 8 \
-m open_clip_train.main \
--model RN50 \
--train-data 'datasets/cc3m/cc3m-train-{0000..0575}.tar' \
--trai…
-
### Describe the issue
Issue:
I wanted to run the pre-train code `https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/pretrain.sh`, but it ends to a device mis-match error. It seems that the…
-
### Description
`jax.nn.dot_product_attention` does the first dot product with `preferred_element_type=jnp.float32` (see [here](https://github.com/jax-ml/jax/blob/7f655972c47658768b6ecce752fa29c3a…
-
Has anyone been able to reproduce the results in Table 1 of the paper? Could you please share the inference script?
We use B=50 for each class and var_d16 for evaluation.
- report
|FID|IS|Pre|R…
-
I have been trying to fix this error for a while now, and the ongoing threads are of NO help.
I have checked these (and ALL issue on the HF community page for this model):
* https://github.com/Qwe…