gan-police / frequency-forensics

Deepfake detection using wavelet-packets in PyTorch, European Conference on Machine Learning (ECML PKDD) 2022.
Other
47 stars 9 forks source link

preprocessing dataset with features=packets #10

Closed aschneid42 closed 3 years ago

aschneid42 commented 3 years ago

For some reason when I preprocess with features=packets the outputs have shape=(128,128,3), the same as the raw images, but with features=log-packets they have a shape=(64,16,16,3)... as a result I get a shape mismatch when trying to train a CNN on packets... shouldn't packets and log-packets result in input features of the same shape?

v0lta commented 3 years ago

Dear @aschneid42, thank you for taking the time to report your issue. What you are describing is indeed unexpected behaviour. The log-scaling happens in https://github.com/gan-police/frequency-forensics/blob/6cbcc7431b1ff76cf6c05824b60cd9de5bdb37eb/src/freqdect/wavelet_math.py#L143 , it should not affect the resulting tensor sizes. I am surprised you got a (128,128,3) result with features=packets, for 3rd degree packets (the default), I would expect tensors of shape (64,16,16,3). I want to reproduce the problem you are describing. Can you elaborate on the specifics of your use case a little more?

v0lta commented 3 years ago

As a side note, we do have https://github.com/gan-police/frequency-forensics/blob/6cbcc7431b1ff76cf6c05824b60cd9de5bdb37eb/src/freqdect/wavelet_math.py#L53 . The code produces a 128,128 packet representation for an input of the same size, if you run it on all three colour channels it should give you a (128,128,3) result. I personally believe that a channel concatenation into the width and height dimension is not a good idea for processing with CNN ( see section 6 of https://arxiv.org/pdf/2106.09369.pdf for the full argument ). To fix the problem you are observing we adapted the CNN architecture ( https://github.com/gan-police/frequency-forensics/blob/6cbcc7431b1ff76cf6c05824b60cd9de5bdb37eb/src/freqdect/models.py#L37 ).