Abstract

CNN 은 low-level feature에 의존한다. image perturbation 이나 domain shift 에 대한 robustness 가 없어서 그런 걸로 추정되고 있다. 보통 CL 은 semantic information 은 보존하면서 superficial features 를 perturb 하는 positive pair를 활용한다.

재밌게도, 여기서는 반대로 생성을 한다. semantic information 은 perturb 주고, superficial feature 들은 보존한다. Negative sample 생성을 위해 texture-based, patch-based augmentation 을 제안한다. 이렇게 학습했더니 generalization 성능이 좋다.

Method

texture-based augmentation 은 texture는 비슷한 realistic 한 이미지를 만들어 내는 것. patch-based augmentation 은 패치를 뒤섞어버림.

texture-based aug는 요 논문(https://graphics.stanford.edu/papers/texture-synthesis-sig00/texture.pdf) 방식을 썼다. 2000년에 나왔고, 인용수도 2000이 넘는다.

실험 좀 해보니까 patch-based aug 가 좀 더 최종성능이 좋더라.

MOCO v2 에다가 적용할 때는 다음 수식과 같이 적용했다. z_n 은 그냥 MOCO v2 에서의 negative sample, z_ns 는 patch-based 혹은 texture-based Negative Sample 이다. alpha 값이 추가로 들어가서 제안한 ns 비율을 조정한다.

BYOL에 적용할 때는 다음과 같이 적용한다. [기존 MSE] 에서 [negative로 구한 MSE] 를 빼준 형태.

Result

ImageNet-100

out-of-domain (OOD) 성능 평가를 위해 4가지 셋 더 활용

ImageNet-C(orruption)
ImageNet-S(ketch)
Stylized-ImageNet
ImageNet-R(endition)

moco v2 로 200 epoch 돌려보고, 여러가지 set 들을 prediction 해 봤는데, 그냥 원래 방식대로 학습하면, texture-based 나 patch-based negative sample 들은 잘 예측을 못하더라. 파란색은 그냥 원래방식, 빨간색은 alpha==2, 초록색은 alpha==3 이다.

ImageNet-1K

Memory bank size

CL 은 negative sample 수가 중요하다. 이를 위해 large-batch 를 쓰거나 memory bank 를 쓰는데, 관련해서도 실험을 해주었다.

(+) 주장을 뒷받침하기 위한 다양한 실험결과가 paper 에 수록되어 있다.

dhkim0225 / 1day_1paper

[60] Robust Contrastive Learning Using Negative Samples with Diminished Semantics #88