Open JakoLex opened 3 weeks ago
@JakoLex The reason it was very slow when you were doing pipe.to('cuda') was because it was using shared ram(basically cpu ram) which massively slows down inference.
I would highly recommend doing this with Flux.1 Dev: https://gist.github.com/sayakpaul/e1f28e86d0756d587c0b898c73822c47
This should massively boost inference speed and also use much less vram.
I am using the diffusers library with Flux-dev and Flux-schnell. I got the following script from here and modified it a bit. Are there any other performance improvements I can get out of my RTX 3090 with 24GB VRAM and 32GB RAM. I commented
pipeline.to("cuda")
out as just usingpipeline.enable_sequential_cpu_offload()
was much faster.