Hello dear authors,
Thank you for providing your work and code.
I understand from your paper that you used patch size = 4 in all your models, is there any specific reason to do that?
Did you try any larger patch sizes to begin with like 8 or 16? This reduces the flops significantly.
I am trying to further compress your network for my application and I was able to successfully do it for patch size = 4 but I was unable to retrain the model with patch size = 8 since I don't see any model with that size.
Any comments or suggestions would be really helpful.
Hello dear authors, Thank you for providing your work and code.
I understand from your paper that you used patch size = 4 in all your models, is there any specific reason to do that? Did you try any larger patch sizes to begin with like 8 or 16? This reduces the flops significantly.
I am trying to further compress your network for my application and I was able to successfully do it for patch size = 4 but I was unable to retrain the model with patch size = 8 since I don't see any model with that size.
Any comments or suggestions would be really helpful.
Thank you!