confusion about conditioning embedding

hello,

I have a question about conditioning embedding that is input into to diffusion model(here are your codes):

                clip_emb = self.model.cc_projection(torch.cat([self.clip_emb, T[None, None, :]], dim=-1))

                cond['c_crossattn'] = [torch.cat([torch.zeros_like(clip_emb).to(self.device), clip_emb], dim=0)]
                cond['c_concat'] = [torch.cat([torch.zeros_like(self.vae_emb).to(self.device), self.vae_emb], dim=0)]

Why do you concatenate zero_like(clip_emb) and itself as a condition? Is it only because of the tensor size or something else?

cvlab-columbia / zero123

confusion about conditioning embedding #86