NorbertZheng / read-papers

My paper reading notes.
MIT License
7 stars 0 forks source link

Sik-Ho Tang | Review -- Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. #124

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review — Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles.

NorbertZheng commented 1 year ago

Overview

Solving Jigsaw Puzzles as Pretext Task for Self-Supervised Learning.

image Learning image representations by solving Jigsaw puzzles. (a): The image from which the tiles (marked with green lines) are extracted. (b): A puzzle obtained by shuffling the tiles. (c): determining the relative position (the relative location between the central tile and the top-left and top-middle tiles is ambiguous.)

In this paper, Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles, Jigsaw Puzzles / CFN, by the University of Bern, is reviewed. In this paper:

This is a paper in 2016 ECCV with over 1200 citations.

NorbertZheng commented 1 year ago

Feature Learning by Solving Jigsaw Puzzles

Conceptual Idea

image Most of the shape of these 2 pairs of images is the same.

Two cars that have different colors and two dogs with different fur patterns. The features learned to solve puzzles in one (car/dog) image will apply also to the other (car/dog) image as they will be invariant to shared patterns.

Naïve Stacked Patches NOT Working

An immediate approach to solve Jigsaw puzzles is to

and input these channels into a CNN to solve the Jigsaw puzzles.

The problem with this design is that the network

A CNN with only low-level features learnt is NOT what we want.

Late fusion is used to force the proposed CFN to learn high-level features.

NorbertZheng commented 1 year ago

Extract high-level features first, then fuse data!!!

NorbertZheng commented 1 year ago

Context Free Network (CFN): Network Architecture

image Context Free Network (CFN): Network Architecture.

Framework

CFN is designed to force the network to learn high-level features.

Training

NorbertZheng commented 1 year ago

Avoid Shortcuts

One important point is to avoid shortcuts, i.e. avoid the network to learn low-level features which makes the network forgot to learn the high-level features.

NorbertZheng commented 1 year ago

Experimental Results

The training uses 1.3M (1'300'000) color images of 256×256 pixels from ImageNet. Then the model is transferred to other tasks.

ImageNet

image

The proposed method CFN achieves 34.6% when only fully connected layers are trained.

There is a significant improvement (from 34.6% to 45.3%) when the conv5 layer is also trained. This shows that

NorbertZheng commented 1 year ago

PASCAL VOC

image Results on PASCAL VOC 2007 Detection and Classification.

NorbertZheng commented 1 year ago

Jigsaw Puzzles or CFN is closing the gap with features obtained with supervision (Supervised AlexNet [25]).

NorbertZheng commented 1 year ago

Reference