I believe a goal of un-/self-supervised learning is to learn transferrable feature representations. I notice that MoCo v3 conduct a study on some smaller image classification datasets such as CIFAR-10/-100, and the performance is quite impressive.
But it seems that the performance of modern neural nets on these image classification datasets is somewhat saturated. I believe the community is more interested in more challenging downstream dense prediction tasks such as object detection and scene parsing. The specific task decoder layers such as DETR (for object detection) and SETR (for semantic segmentation or scene parsing) can be almost used out of the box. I wonder is there a plan on studying the transfer learning performance of MoCo v3 on downstream dense prediction tasks in the future?
Thanks for your great work!
I believe a goal of un-/self-supervised learning is to learn transferrable feature representations. I notice that MoCo v3 conduct a study on some smaller image classification datasets such as CIFAR-10/-100, and the performance is quite impressive.
But it seems that the performance of modern neural nets on these image classification datasets is somewhat saturated. I believe the community is more interested in more challenging downstream dense prediction tasks such as object detection and scene parsing. The specific task decoder layers such as DETR (for object detection) and SETR (for semantic segmentation or scene parsing) can be almost used out of the box. I wonder is there a plan on studying the transfer learning performance of MoCo v3 on downstream dense prediction tasks in the future?