The models have very different embedding dimensions, depth, and num_heads, and are incompatible with each other. However, in Tab. 6 of the paper, these two works share the same architecture in "Arch." column. Are the two architectures different, as it shows in the code? If so, it should probably be clarified in terms of the number of parameters in the paper.
AIM-600M:
https://github.com/apple/ml-aim/blob/0b1dea9128f4734ae89252078e65aa102999407a/aim/torch/models.py#L176-L185
MAE ViT-H/14:
https://github.com/facebookresearch/mae/blob/efb2a8062c206524e35e47d04501ed4f544c0ae8/models_vit.py#L70-L74
The models have very different embedding dimensions, depth, and num_heads, and are incompatible with each other. However, in Tab. 6 of the paper, these two works share the same architecture in "Arch." column. Are the two architectures different, as it shows in the code? If so, it should probably be clarified in terms of the number of parameters in the paper.