-
### Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related iss…
-
### Feature Idea
https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha
### Existing Solutions
_No response_
### Other
_No response_
-
- [ ] Fp8 kv-cache
- [ ] Kv-cache prefix reuse
- [ ] Grammar constrained speedup
- [ ] `torch.compile` like speedups
- [ ] Simple one-liner `pip install`
- [ ] Multi lora support (lorax kind of)
…
-
### Your current environment
```text
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC ve…
-
I've tried the options for 12G, 16G, and 20G VRAM options here: https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#flux1-lora-training and confirm they all work.
But is it possible …
-
Hello everyone,
First off, a big thanks to city96 for the awesome work they've been contributing to the community. It's been incredibly helpful!
Here are my system specs:
Processor: Intel i5-13…
-
I really like the simplicity of TK and think it could be broadly applicable to kernel authoring beyond attention. Has there been any benchmarking done of pure GEMM operations? If so, an example would …
-
Hi, I successfully full finetuned flux with ostris ai toolkit and I got theses 3 files at the end of the training ( diffusion model files ) :
diffusion_pytorch_model-00001-of-00003.safetensors
dif…
-
I tried flux training on a 2080ti with 22GB of VRAM, but I keep getting an error:
` Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Ex…
-
Hi,
When i tried the fp8 gemm code in matmul.py to cast the input "a" to be float16 but casted to fp8 just before the dot product op by setting AB_DTYPE to be tl.float8e4nv (link: https://github.com/…