How to change the GWA to handle different input sizes?

ZBigFish / FER-VT

The unofficial implementation of paper "Facial expression recognition with grid-wise attention and visual transformer"

16 stars 2 forks source link

How to change the GWA to handle different input sizes? #5

Open HaithemH opened 1 year ago

HaithemH commented 1 year ago

Hi Thank you for sharing the code! I would like to know how to change the GWA class to handle different input sizes, for instance (112,112)?

ZBigFish commented 1 year ago

it's simple. To change the input size, you just need to modify the self.patch_embeddings = nn.Conv2d(in_channels=3, out_channels=9408, kernel_size=(56, 56), stride=(56, 56)) just like the ViT. And the forward func should be change too. Such as x = x.flatten(2).transpose(-1, -2).view(batchsize, 16, 3, 56, 56) etc.

Or you can just resize the image to 224x224 in pre-process phase.

Hope can answer your question.

Have a nice day~