Closed liguopeng0923 closed 5 months ago
The teaser pic is just an illustration of the overall idea, showing how to train a single model with different patch sizes.
I believe the actual patch sizes are 24 and 4 respectively, which corresponds to many more tokens.
Fine, thanks.
Can you provide the trained models with normalization like ImageNet? mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]
This is very important for our later work.
Hi,
I want to know how to reproduce the results of your teaser in Flexivit.
An image is split into 2*2, and the accuracy is 84.4%.
Best, Guopeng.