Alpha-VL / ConvMAE

ConvMAE: Masked Convolution Meets Masked Autoencoders
MIT License
477 stars 41 forks source link

Why not use the masked transformers directly in the first two stages? #36

Open xwan0527 opened 4 months ago

xwan0527 commented 4 months ago

Why use convolutions instead? Since upsampling is already employed to obtain the mask matrix, it seems like transformers could also be used.