Can this model directly use to generate video? Or we need train from start?

PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

GNU Affero General Public License v3.0

1.47k stars 70 forks source link

Open foreverpiano opened 1 week ago

foreverpiano commented 1 week ago

@lawrence-cj

lawrence-cj commented 1 week ago

You can start to train a video generation model based on Sigma. Some of existing works are doing so.

foreverpiano commented 1 week ago

Like Open-Sora by adding time dimension? the current version is still 1dverison，

lawrence-cj commented 1 week ago

Yes. what else do you plan to do except the 1d temporal dimension?

foreverpiano commented 1 week ago

I don't really understand how to transform 1d to 2d easily. So is the major thing to add time embedding in fintuning?