dyelax / Adversarial_Video_Generation

A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.
MIT License
734 stars 184 forks source link

Question about discriminator input #22

Open serkansulun opened 6 years ago

serkansulun commented 6 years ago

Hi Matt, Thanks for the great code. According to the paper, the input to the discriminator is a sequence of frames (history of frames, and the next frame). If I understand your code correctly, the input to the discriminator is a single frame, which is only the next frame (either generated or from ground-truth). Is this right? If yes, wouldn't this prevent the discriminator to make use of the continuity in the video? Thanks in advance.

dyelax commented 6 years ago

@ssulun16 You are totally right. Cant believe I missed that.

From section 2.2 in the paper:

The discriminative model D takes a sequence of frames, and is trained to predict the probability that the last frames of the sequence are generated by G. Note only the last frames are either real of generated by G, the rest of the sequence is always from the dataset. This allows the discriminative model to make use of temporal information, so that G learns to produce sequences that are temporally coherent with its input.

I'm pretty busy with other projects right now, but would be awesome if you could make a PR that fixes this! Seems like it should just be a small tweak in the build_feed_dict function of DiscriminatorModel

serkansulun commented 6 years ago

I made some other changes before so the code isn't very clean but I'll do that if I can find some time after I have a final version for myself. In the meantime I'll open issues if I find other problems so that it can help other people.

serkansulun commented 6 years ago

I have fixed it but I also switched to Pytorch in the meantime, it is available in my repositories if anyone needs it.