ethansmith2000 / ImprovedTokenMerge

39 stars 1 forks source link

Unable to achieve any acceleration on SD turbo #3

Closed fingerk28 closed 6 months ago

fingerk28 commented 6 months ago

After conducting some experiments, I have found that it is not possible to achieve any acceleration on SD turbo. The main reason seems to be that the guidance level of SD turbo is 0, which means the batch size is 1, unlike other SD models which have a batch size of 2. If you try to set the guidance level of SD 2.1 to 0, the acceleration effect also disappears. However, I am puzzled, as token downsampling should be unrelated to batch size, right? Why would the acceleration effect be influenced by it? I hope you can provide some help, thank you very much.

ethansmith2000 commented 6 months ago

Hey there, I don't believe the acceleration should be directly dependent on batch size, although you can see more efficiency with larger batch sizes just by nature of parallelization.

For SD Turbo my thoughts would be that the majority of inference time ends up being bottlenecked by the VAE rather than actual diffusion process if you're doing for example 1 step

If you try to set the guidance level of SD 2.1 to 0, the acceleration effect also disappears

what resolution are you running at? gains are typically seen >=1024x1024

fingerk28 commented 6 months ago

I ran experiments on SD Turbo with 20 steps at a resolution of 512, and when the guidance level is set to 7.5, there is an approximate 18% acceleration. However, when the guidance level is set to 0, the speed even decreases by 2%.

fingerk28 commented 6 months ago

Indeed, as you mentioned, the acceleration effect of SD Turbo is limited in the case of 1 step. However, with the guidance level set to 7.5, there is still an 8% acceleration. Based on the experimental results, the guidance level (effective batch size=1 or 2) appears to be the biggest factor affecting the acceleration of SD Turbo, although I also cannot understand why.

fingerk28 commented 6 months ago

I apologize, my original experiment was conducted in a GPU computing center, which may have led to incorrect results. After rerunning the experiment on a Linux PC, everything worked normally. Thank you for your response.