Open statjuns opened 2 years ago
Hi @statjuns ,
Thank you for your interest in this project! I have borrowed that part of the code from the original StoryGAN paper (pdf). The explanation they provide is connected to a previous paper MoCoGAN for video generation, which decomposes the video into a motion channel and content channel and then merges it. My understanding is that the content comes from glove embeddings (one representation for each caption) and the motion comes from the crnn_code (which is extracted using all the captions in the story). Please refer to the MoCoGAN paper for more details. I am happy to engage in further discussion about this topic.
Hi, thank you for your nice work! Currently, I've been reproducing your paper.
I think you're using label(crnn_code) and glove embeddings(zmc_code, m_image) to create the images in your implementation. But I can't find explanation about this part in your paper.
Can you give me more explanation for this part?