-
Flux LORA training is bleeding concepts, if you train many people at the same time get all mixed, is ignoring the unique tokens assigned to each character, I think the problem is that is training only…
-
Hello @mgoin, it's a pleasant surprise to discover this project. Thank you for your contributions to BitBLAS. We have recently added support for FP8 Matmul, hoping it will help this project.
-
Please see this commit that Comfy pushed earlier today that fixes the issue where some Flux LoRA are very weak when using along w/ fp8. It would be great if Forge were similarly updated so there is co…
CCpt5 updated
4 weeks ago
-
Hello!
I try to implement a multi-stage training with fp8 autocast. However, when I load checkpoint from first training stage using torch's `load_state_dict(...)`, loss quickly explodes.
Are th…
-
# Release Manager
@cp5555
# Endgame
- [x] Code freeze: Feb. 9th, 2024
- [x] Bug Bash date: Feb. 12th, 2024
- [x] Release date: Feb. 23rd, 2024
# Main Features
## MS-AMP O3 Optimization
-…
-
Using hires makes vertical/horizontal banding artifects. I have tested with different hires upscalers some are less noticable, but it is visible depending on the image. Also it is more prominent if we…
-
Here is the development roadmap for 2024 Q3. Contributions and feedback are welcome.
## Server API
- [ ] Add APIs for using the inference engine in a single script without launching a separate se…
-
This issue is to track the new design required for flash-attention on bottom-up optimization pipeline.
## Status
The most of the optimization passes has been finished and been checked in llvm-targ…
-
https://github.com/triton-lang/triton/blob/95623038c75463286aa5d4a44782ba7492cc1afa/python/triton/language/semantic.py#L761C1-L763C1
how to resolve this
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A…