Closed rose-jinyang closed 1 year ago
Training 512x512 images for MAE is not efficient enough, because the transformer has quadratic complexity for the image size. But I think step2 and step3 can be unified, i.e., training ACR with 512 directly.
Thanks
Hello How are you? Thanks for contributing to this project. It seems that there are 3 steps for training. step 1: to train MAE step 2: to train ACR step 3: to finetune the trained model
I know that the image sizes for step 1 and 2 are 256 and 256~512 for step 3. Is it correct? What about using 512 rather than 256 as image sizes for both step 1 and 2?