aredden / flux-fp8-api

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
Apache License 2.0
109 stars 12 forks source link

No issue - just a thank you! #4

Closed ashakoen closed 2 weeks ago

ashakoen commented 3 weeks ago

I've been learning a lot by getting this project up and running on a lower-end GPU. I've had no major issues, and I already have the LoRA loading working. Just wanted to say thanks!!

aredden commented 3 weeks ago

You're welcome! :)

emcmanus commented 1 week ago

@aredden Kudos! There's a definite need for an open source, performance-oriented (non-interactive) inference server for production use.

This is a very hackable codebase and performance really is 2x other solutions – thank you!