czg1225 / AsyncDiff

Official implementation of "AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising"
https://czg1225.github.io/asyncdiff_page/
Apache License 2.0
140 stars 7 forks source link

Seeking Opportunities for Further Cooperation #5

Closed Steaunk closed 2 months ago

Steaunk commented 2 months ago

Hello, this is impressive work for achieving high-performance in diffusion models. We are interested in exploring how your insights from this study. Additionally, we are keen to delve into your other contributions, such as DeepCache, which optimizes computational efficiency by caching and retrieving features across adjacent denoising stages in sequential diffusion models, leveraging inherent temporal redundancy. We intend to cite DeepCache in our publications.

However, we would like to highlight a mutual finding between your work and ours, specifically our recent publication on PipeFusion GitHub & arXiv, which was released on 23 May 2024. We have both noticed the high similarity between the input from adjacent diffusion steps. While our papers employ different models (SD & DiT), we've noticed that we successively propose a pipeline parallel manner to orchestrate communication and computations, reflecting our shared academic pursuit.

Here, we provided some excerpts and figures from our publication for collaborative study at the end.

Looking forward to potential future collaborations and mutual learning with your team to enrich our community.


Screenshot 2024-06-24 at 7 05 10 PM

截屏2024-06-24 20 09 25

PipeFusion splits images into patches and distributes the network layers across multiple devices. It employs a pipeline parallel manner to orchestrate communication and computations. By leveraging the high similarity between the input from adjacent diffusion steps, PipeFusion eliminates the waiting time in the pipeline by reusing the one-step stale feature maps to provide context for the current step.

Leveraging input temporal redundancy, a device does not need to wait for the receiving of full spatial shape activations for the current pipeline step to start the computation of its own stage. Instead, it employs the one-step stale activations to provide context for the current step. Consequently, after the pipeline is initialized, there is no waiting time within the pipeline

czg1225 commented 2 months ago

Hi @Steaunk , Thank you for your kind words and interest on our work. We look forward to potential future collaborations and mutual learning opportunities with your team to further enrich our community.