DariusAf / MesoNet

"MesoNet: a Compact Facial Video Forgery Detection Network" (D. Afchar, V. Nozick) - IEEE WIFS 2018
Apache License 2.0
251 stars 112 forks source link

How the data is loaded #4

Closed chinmay5 closed 5 years ago

chinmay5 commented 5 years ago

For the training, how was the data loaded? I am especially interested in the part where the architecture works for 2D images so if the pre-processed version of Face2Face is having 300 frames per video(assume), then how is the network tackling that. Are you stacking so many images together one on top of other to create a sort of "voxel"? Any help shall be highly appreciated.

DariusAf commented 5 years ago

I am not sure to understand your question. The network takes as an input 2D images of size 256 x 256 x 3 and predicts if it is forged (0) or a real image (1). So, we train on a per-frame basis, so we don't have to worry about video lengths. Then, for testing, as described in the paper, the idea to is to take an average of those predictions over the video to get the final video prediction measure.

chinmay5 commented 5 years ago

Thanks @DariusAf for the response. Actually I am more familiar with PyTorch wherein DataLoader needs to be written for reading images. So, if I understand it right, say you have 50 frames for a video based on the pre-processing pipeline, you load all these 50 and consider all 50 as being a real/fake image and update weights of the network accordingly and hence the number of frames do not remain relevant.... Please correct me if I am wrong

DariusAf commented 5 years ago

Ok, now I have understood your question. No, we do not specifically train the network taking into consideration this mean prediction for video, that is just for the validation and usage. In the dataset I have just sent you, you will see that we have extracted roughly 30 frames for each face and illumination configuration. We train using a ImageDataGenerator, which is the equivalent of the DataLoader in torch.

Specifically, here is the code for the generator :

dataGenerator = ImageDataGenerator(
        rescale=1./255,
        zoom_range=0.2,
        rotation_range=15,
        brightness_range=(0.8, 1.2),
        channel_shift_range=30,
        horizontal_flip=True,
        validation_split=0.15)

generator = dataGenerator.flow_from_directory(
        'database_deepfake',
        target_size=(256, 256),
        batch_size=64,
        class_mode='binary',
        subset='training')

Then you train using

        for x_batch, y_batch in generator:
            loss = model.fit(x_batch, y_batch)
[...]

Hope that helps.