Closed CaptainEven closed 1 year ago
You can check the following hybrid conv-transformer MAE architecture for your interest.
Is it possible to replace a ViT backbone with a regular CNN backbone like Resnet?
Perhaps this is what you're looking for: MAE on standard ResNets or ConvNeXts: "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
Thanks for the advice!
Is it possible to replace a ViT backbone with a regular CNN backbone like Resnet?