Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
37.84k stars 4.88k forks source link

create timestep embedding tensor on device #375

Open drhead opened 2 months ago

drhead commented 2 months ago

The timestep embed function currently creates a tensor on cpu and then moves it to GPU which causes a forced device sync every forward pass. This creates it directly on device, which avoids the issue and stops it from blocking dispatch.