RQ-Wu / LAMP

Official implement code of LAMP: Learn a Motion Pattern by Few-Shot Tuning a Text-to-Image Diffusion Model (Few-shot-based text-to-video diffusion)
https://rq-wu.github.io/projects/LAMP/index.html
Other
236 stars 10 forks source link

Regarding the paper #15

Open 18445864529 opened 5 months ago

18445864529 commented 5 months ago

Hi, thank you for the interesting work. I have a question about the proposed method.

a 2D convolution with an output channel of 1 along with a Sigmoid function is added

self.conv_gate = nn.Conv2d(out_channels, 1, 3, stride=1, padding=1)
x_gate = rearrange(x_2d, "b c f h w -> (b f) c h w")
c = x_gate.shape[1]
x_gate = self.sigmoid(self.conv_gate(x_gate)).repeat(1, c, 1, 1)

I would like to know what is the insight behind using a c -> 1 channel convolution and then repeating back c times. As a side question, what is the purpose of using a sigmoid function after this branch before multiplying to the conv_1d output? Thanks.