chufanchen / read-paper-and-code

0 stars 0 forks source link

CVPR 2024 | DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models #1

Open chufanchen opened 8 months ago

chufanchen commented 8 months ago

https://arxiv.org/abs/2402.19481

https://github.com/mit-han-lab/distrifuser

chufanchen commented 8 months ago

Introduction

LLM: Memory-bounded

Tensor parallelism: increased memory bandwidth outweights communication overhead

Data parallelism, pipeline parallelism


Diffusion: Compute-bounded

Only data parallelism has been used for diffusion model serving

Our method introduces a new parallelization strategy called displaced patch parallelism, tailored to the sequential characteristics of diffusion model.

chufanchen commented 8 months ago

Background

This computational demand escalates more than quadratically with increasing resolution.

$xt$ depends on $x{t-1}$, parallel computation of $\epsilont$ and $\epsilon{t-1}$ is challenging.

Related work

ParaDiGMS employing Picard iterations to parallelize the denoising steps in a data-parallel manner.

Tensor parallelism suffers from intolerable communication costs.

chufanchen commented 8 months ago

Method

Displaced patch parallelism

Activation displacement