chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.05k stars 59 forks source link

can stable fast used with diffusers device_map='auto' #132

Open zhangvia opened 4 months ago

zhangvia commented 4 months ago

hey,fantasy work! i 'm seeking some framework that can speedup diffusion and maintain the flexibility of original torch code. i check the all features of your work, and it's really great.

and i want to know if this repo can still work when device_map in diffuser pipeline is set to "auto"

this is a new feature which will be released in next version of diffusers.

when you set the device_map to 'auto', diffusers will check all gpus you have on the server, and put different model to different gpu. and that will allow somebody who don't have high end gpu with large vram can still use diffusers with several low end gpu.

think about this: you load two controlnet, a whole txt2img pipeline. you want to generate a 1024*1024 picture。if you don't have a gpu with vram more than 20GB, that will be impossible. but with device_map='auto', the controlnet will be load to 'cuda:0', the unet will be load to 'cuda:1' and so on. so that you can generate that on two gpu like RTX2080ti.

they implement the device_map='auto' through hooks,which is a feature of accelerate. accelerate can add a hook which can align the device of input data and model to the diffusers model. so i want to know if the stable fast can work when the device_map = 'auto'

chengzeyi commented 1 month ago

@zhangvia Sorry, it is not tested. I guess you can give it a try.