Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.44k stars 147 forks source link

Implementation of compression frame patch embedding (Fig. 3b) #52

Closed paulchhuang closed 3 months ago

paulchhuang commented 4 months ago

Hi, Thanks for the great work. I have a few questions:

  1. By default which "patch embedding" is used? Fig.3(a) or (b)?
  2. Is there a parameter to switch between (a) and (b) in a config file?
  3. I'd like to take a look at the implementation of (b) -- compression frame patch embedding. I see PatchEmbed several places and they are from different libs: sometimes from diffuser sometimes from timm. Do you have a pointer to the code where Fig.3(b) is implemented?
maxin-cn commented 4 months ago

Hi, Thanks for the great work. I have a few questions:

  1. By default which "patch embedding" is used? Fig.3(a) or (b)?
  2. Is there a parameter to switch between (a) and (b) in a config file?
  3. I'd like to take a look at the implementation of (b) -- compression frame patch embedding. I see PatchEmbed several places and they are from different libs: sometimes from diffuser sometimes from timm. Do you have a pointer to the code where Fig.3(b) is implemented?
  1. Latte uses Fig.3 (a) by default.
  2. This repo does not provide (b).
  3. Please refer to here.
paulchhuang commented 3 months ago

Thanks for the prompt reply and pointers.