ganler / ResearchReading

General system research material (not limited to paper) reading notes.
GNU General Public License v3.0
20 stars 1 forks source link

UC Berkeley -- CS294-158 20Spring: Deep Unsupervised Learning #3

Closed ganler closed 3 years ago

ganler commented 3 years ago

If U ask me why a sys guy would care math stuff... I would say that unsupervised learning is a trend and the future of industry(money-saving...) Learning DUL is to provide ideas and challenges in building future big data systems.

Course Site: https://sites.google.com/view/berkeley-cs294-158-sp20/home

ganler commented 3 years ago

Lecture 7: Self-Supervised Learning

YOUTUBE

Reconstruct From A Corrupted Version

Denoising Autoencoder

image

Loss Func:

image

Stacked Denoising Autoencoder: Add noise to the internal feature vector.

Context Encoder

Mask out a rectangular region from an image. => Reconstruct the actual image.

image

image

image

Predicting One View From Another

L => AB

image

Visual Common Sense Tasks

Relative Position of Image Patches

Center patch + Other Patch => The relative position of the "other patch".

image

Solving Jigsaw Puzzles.

image

Rotation

Predict neighboring context

Word2Vec.

Contrastive Learning

CPC: Contrastive Predictive Coding

Associated data sequence(audio, image, whatever) => the future data.

image

We have: positive samples: Grabbed from the raw data. (maybe a crop from the raw image) negative samples: Unrelated data from the dataset. (a crop from some other images in the dataset)

For each input(say, an image), we have (N-1) negative samples & 1 positive sample.

CPC uses an RNN encoder to encode the input sequence into a context vector(c_t) as the high-level feature(also called slow feature).

image

The goal is to maximize the mutual information between c_t and z_positive while minimizing that between c_t and z_negative.

Instance Discrimination

Do classification at the instance level. (Every image is a class.)

MoCo & SimCLR.

Memory Bank(2018)

Challenge: It is impossible to have every image occupying a class in the real world. (u don't have that much memory)

But we can have a memory bank to store the feature vectors.

Pipeline: A batch of images -> feature map -> L2 128-dim vector -> Non-Parametric Softmax Classifier -> The probability of positive for each image, stored in the memory bank. (128D Unit Sphere)

Problem:

MoCo(2020)

The task in training: Which key is responsible for the query?

image

作者提出建立dictionary依赖两个必要条件:1. large,dictionary的大小需要足够大,才能对高维、连续空间进行很好的表达;2. consistent,dictionary的key需要使用相同或者相似的encoder进行编码,这样query和key之间的距离度量才能够一致并且有意义。

E2E(The parameters for encoding the keys in the dictionary remains the same): Good consistency but cannot scale. (consistent but not large) Memory bank (large but not consistent)

image

Task: Can we have both scalability + consistency?

2 networks:

At the start:

There are K keys(negative samples) in the momentum queue.

Pipeline:

Augumentation: x => (x, k_+) where x is the query(maybe batched). Then compare the query with the K keys in the queue using N-pair contrastive loss. Back prop: Update query encoder + momentum encoder. Finally, put this batch into the queue.

For momentum encoder: