I am experimenting with your main.py for Flux with LoRAs. Command line Windows, using your example command.
python main.py --prompt "A cute corgi lives in a house made out of sushi, anime" --lora_repo_id XLabs-AI/flux-lora-collection --lora_name anime_lora.safetensors --device cuda --offload --use_lora --model_type flux-dev-fp8 --width 1024 --height 1024
The step
Start a quantization process...
seems to be needed every run? This adds around 2m30s to each image.
Is it possible for you to change this so after the first quantization the model is saved then loaded for future runs to be much faster.
For example, this is how I do with the "flux on potato" code to only do the quantization once. This is for that script, but you get the idea.
Hi! thanks for the comment - indeed, the implementation is not optimized right now. every run the model quantization is called. i will update the code with your recommendation
I am experimenting with your main.py for Flux with LoRAs. Command line Windows, using your example command.
python main.py --prompt "A cute corgi lives in a house made out of sushi, anime" --lora_repo_id XLabs-AI/flux-lora-collection --lora_name anime_lora.safetensors --device cuda --offload --use_lora --model_type flux-dev-fp8 --width 1024 --height 1024
The step
Start a quantization process...
seems to be needed every run? This adds around 2m30s to each image. Is it possible for you to change this so after the first quantization the model is saved then loaded for future runs to be much faster. For example, this is how I do with the "flux on potato" code to only do the quantization once. This is for that script, but you get the idea.Without that tweak it is currently taking around 3m30s to run per image on a 24 GB 4090. Anything else that can be done to speed it up?
Thanks for any tips/ideas.