Closed NorbertZheng closed 1 year ago
Solving Jigsaw Puzzles as Pretext Task for Self-Supervised Learning.
Learning image representations by solving Jigsaw puzzles. (a): The image from which the tiles (marked with green lines) are extracted. (b): A puzzle obtained by shuffling the tiles. (c): determining the relative position (the relative location between the central tile and the top-left and top-middle tiles is ambiguous.)
In this paper, Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles, Jigsaw Puzzles / CFN, by the University of Bern, is reviewed. In this paper:
Solving Jigsaw puzzles is treated as a pretext task, which requires no manual labeling. By training the CFN to solve Jigsaw puzzles, both
are learned.
This is a paper in 2016 ECCV with over 1200 citations.
Most of the shape of these 2 pairs of images is the same.
Two cars that have different colors and two dogs with different fur patterns. The features learned to solve puzzles in one (car/dog) image will apply also to the other (car/dog) image as they will be invariant to shared patterns.
An immediate approach to solve Jigsaw puzzles is to
and input these channels into a CNN to solve the Jigsaw puzzles.
The problem with this design is that the network
A CNN with only low-level features learnt is NOT what we want.
Late fusion is used to force the proposed CFN to learn high-level features.
Extract high-level features first, then fuse data!!!
Context Free Network (CFN): Network Architecture.
CFN is designed to force the network to learn high-level features.
One important point is to avoid shortcuts, i.e. avoid the network to learn low-level features which makes the network forgot to learn the high-level features.
The training uses 1.3M (1'300'000) color images of 256×256 pixels from ImageNet. Then the model is transferred to other tasks.
The proposed method CFN achieves 34.6% when only fully connected layers are trained.
There is a significant improvement (from 34.6% to 45.3%) when the conv5 layer is also trained. This shows that
Results on PASCAL VOC 2007 Detection and Classification.
Jigsaw Puzzles or CFN is closing the gap with features obtained with supervision (Supervised AlexNet [25]).
Sik-Ho Tang. Review — Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles.