cvlab-columbia / zero123

Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)
https://zero123.cs.columbia.edu/
MIT License
2.59k stars 188 forks source link

confusion about conditioning embedding #86

Open wyiguanw opened 10 months ago

wyiguanw commented 10 months ago

hello,

I have a question about conditioning embedding that is input into to diffusion model(here are your codes):

                clip_emb = self.model.cc_projection(torch.cat([self.clip_emb, T[None, None, :]], dim=-1))

                cond['c_crossattn'] = [torch.cat([torch.zeros_like(clip_emb).to(self.device), clip_emb], dim=0)]
                cond['c_concat'] = [torch.cat([torch.zeros_like(self.vae_emb).to(self.device), self.vae_emb], dim=0)]

Why do you concatenate zero_like(clip_emb) and itself as a condition? Is it only because of the tensor size or something else?