-
Here is a small write-up I've build and use it:
- https://gist.github.com/andy108369/c487dcd784d93a29e7edca805dd5be57
```
(.venv) root@node2:~# huggingface-cli download meta-llama/Meta-Llama-3.1-…
-
### System Info
CPU x86_64
GPU L40s
TensorRT branch: main
commid id:b57221b764bc579cbb2490154916a871f620e2c4
CUDA:
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA V…
-
I tested nfloat4 quite a bit on onetrainer, and the results are basically the the same as sd scripts, but almost 9gb vram less.
I was wondering if its possible for you to implement it on your scrip…
-
Small LLMs trained using FP8 with 32 GPUs can achieve 20~30% speed up comparing with bf16.
However, scaling up to 1000+ GPUs only achieve less than 5% speed up (TP2 PP4 VP4).
Any suggestion to de…
-
```
!!! Exception during processing !!! Sizes of tensors must match except in dimension 2. Expected size 60 but got size 12 for tensor number 1 in the list.
Traceback (most recent call last):
Fil…
-
@kohya-ss @lansing @rockerBOO @akx @tsukimiya @wkpark
Would you consider supporting training for OpenFlux? The OpenFlux model link is: https://huggingface.co/ostris/OpenFLUX.1. Given that Flux and i…
-
[context_flashattention_nopad_fp16_fp8.txt](https://github.com/user-attachments/files/16421521/context_flashattention_nopad_fp16_fp8.txt)
we have implemented a f8 version of context_flashattention_…
-
Is there a way to run these models with 12 GB RAM?
With fp8 models it is working but with GGUF models it always fail.
-
hi, thank you for providing this code.
i am currently running the model schnell q2 in kaggle notebook but when it start generating the image it always shows 'using cpu backend' and it does not utiliz…
-
Comfy UI is implementing InstantX ControlNets !
"Canny" and "Depth" are working already:
https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny
https://huggingface.co/Shakker-Labs/FLUX.1-de…