Closed ganler closed 3 years ago
Loss Func:
Stacked Denoising Autoencoder: Add noise to the internal feature vector.
Mask out a rectangular region from an image. => Reconstruct the actual image.
L => AB
Center patch + Other Patch => The relative position of the "other patch".
Word2Vec.
Associated data sequence(audio, image, whatever) => the future data.
We have: positive samples: Grabbed from the raw data. (maybe a crop from the raw image) negative samples: Unrelated data from the dataset. (a crop from some other images in the dataset)
For each input(say, an image), we have (N-1) negative samples & 1 positive sample.
CPC uses an RNN encoder to encode the input sequence into a context vector(c_t) as the high-level feature(also called slow feature).
The goal is to maximize the mutual information between c_t and z_positive while minimizing that between c_t and z_negative.
Do classification at the instance level. (Every image is a class.)
MoCo & SimCLR.
Challenge: It is impossible to have every image occupying a class in the real world. (u don't have that much memory)
But we can have a memory bank to store the feature vectors.
Pipeline: A batch of images -> feature map -> L2 128-dim vector -> Non-Parametric Softmax Classifier -> The probability of positive for each image, stored in the memory bank. (128D Unit Sphere)
Problem:
The task in training: Which key is responsible for the query?
作者提出建立dictionary依赖两个必要条件:1. large,dictionary的大小需要足够大,才能对高维、连续空间进行很好的表达;2. consistent,dictionary的key需要使用相同或者相似的encoder进行编码,这样query和key之间的距离度量才能够一致并且有意义。
E2E(The parameters for encoding the keys in the dictionary remains the same): Good consistency but cannot scale. (consistent but not large) Memory bank (large but not consistent)
Task: Can we have both scalability + consistency?
2 networks:
At the start:
There are K keys(negative samples) in the momentum queue.
Pipeline:
Augumentation: x => (x, k_+) where x is the query(maybe batched). Then compare the query with the K keys in the queue using N-pair contrastive loss. Back prop: Update query encoder + momentum encoder. Finally, put this batch into the queue.
For momentum encoder:
If U ask me why a sys guy would care math stuff... I would say that unsupervised learning is a trend and the future of industry(money-saving...) Learning DUL is to provide ideas and challenges in building future big data systems.
Course Site: https://sites.google.com/view/berkeley-cs294-158-sp20/home