Zero-Shot Fine-Grained Style Transfer: Leveraging Distributed Continuous Style Representations to Transfer To Unseen Styles

codertimo / paper-log

읽어야 하는 논문들을 관리하고, 읽은 논문들의 기록을 남기는 공간

31 stars 5 forks source link

어떤 내용의 논문인가요? 👋

style-transfer 에서는 주로 discrete 한 값(e.g positive, negative)을 style attribute 로 사용함

이 논문에서는 continuous 한 style representation 를 style-transfer 의 attribute 로 줄 수 있는 방법을 제시함

pretraining 을 통해서 각 카테고리의 style representation 을 학습하여, 이전에 학습되지 않았던 style attribute 도 fine-tuning 없이 사용할 수 있는 모델 구조를 제안함

20개의 sentiment 카테고리가 있는 데이터셋을 활용해서, sentiment-to-sentiment 문제를 풀었음

Abstract (요약) 🕵🏻‍♂️

Text style transfer is usually performed using attributes that can take a handful of discrete values (e.g., positive to negative reviews). In this work, we introduce an architecture that can leverage pre-trained consistent continuous distributed style representations and use them to transfer to an attribute unseen during training, without requiring any re-tuning of the style transfer model. We demonstrate the method by training an architecture to transfer text conveying one sentiment to another sentiment, using a fine-grained set of over 20 sentiment labels rather than the binary positive/negative often used in style transfer. Our experiments show that this model can then rewrite text to match a target sentiment that was unseen during training.

Motivation

기존의 unsupervised style transfer 에서는 특정 attribute 를 embedding 에서 representation 으로 넣어주었습니다.
하지만 이렇게 하면 training 시에 사용하지 않았던 unseen attribute 를 사용할 수 없다는 문제가 생긴다. 본 논문은 이를 해결하고자 함.

Method

기본적인 학습은 Denoising Auto Encoder와 Back Translation 을 풀면서 학습을 진행합니다. (이전 연구와 동일)
본 논문에서 novelty 는 각 테스크에서 style latent representation 이 필요로 하는데, 이 representation 을 단순히 attribute embedding 을 사용하지 않고, 아래와 같은 과정을 거쳐서 style representation 을 구하게 됩니다.
BERT, Fasttext 를 활용하여 문장의 attribute 를 classify 하는 network 를 학습시킵니다. 단 이 classification 은 8dim 짜리 embedding representation 을 가깝게 하는 방식으로 학습이 이루어 집니다.
이후 512dim -> 8dim으로 projection 시켜 feature 크기를 줄이고, 이 feature 를 Attribute Representation 으로 생각하고 Back-Translation 에서 필요한 style representation 으로 사용하였습니다.

Experiment

총 24개의 attribute 중에서 20개만 denoising auto encoder 에서 학습으로 사용하고, inference 시에 학습에 사용하지 않은 4개의 attribute 에 대해 얼마나 성능이 나오는지 조사해 보았음.
거의 대부분 90% 이상의 Classification 성능을 보여 주었으며, BLEU 역시 30 이상을 기록하였습니다. 다만 PPL 이 예상보다 많이 높았습니다.

Novelty

이를 통해서 attribute 없이도 이 문장의 style 을 자동으로 encoding 해주는 모델을 만들 수 있습니다.
조금 더 활용 한다면 특정 사람의 말투를 따라하는 등의 방식으로 학습이 될 수 있습니다.

한계점

아직은 해당 style 을 파악하기 위해선, 해당 attribute 에 해당하는 레이블링 데이터가 필요로 합니다.
다만 논문에서 제시된 attribute classifier 를 학습시키는 방법이 모호하게 작성되어 있습니다. 이에 대한 부분을 확인할 필요가 있습니다.

codertimo / paper-log