Closed hussein-jafarinia closed 2 months ago
I'm curious, too. Going to test it.
Found this
--norm_pix_loss
as the target for better representation learning. To train a baseline model (e.g., for visualization), use pixel-based construction and turn off --norm_pix_loss
.I can only get checkpoints including encoder(mae_pretrain_vit_xx.pth), HOW to get a checkpoint including encoder and decoder (mae_visualize_vit_xx.pth) in my own model ?
I can only get checkpoints including encoder(mae_pretrain_vit_xx.pth), HOW to get a checkpoint including encoder and decoder (mae_visualize_vit_xx.pth) in my own model ?
You can find the answer in other issues. As much as I remember you should put "_full" at the end of name before ".pth".
This problem is answered in other issues completely. so I close it.
There are two different checkpoints for each vit in your source code. One group is checkpoints that only include encdoer (e.g. https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth) and the other group is checkpoints that include both encoder and decoder which we can find their linux in the demo code (e.g. https://dl.fbaipublicfiles.com/mae/visualize/mae_visualize_vit_large.pth). why aren't they weights in them isn't the same? What is the difference for these two (for example in training parameters)? which group is better?