Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation

codertimo commented 4 years ago

어떤 내용의 논문인가요? 👋

latent space 에 얽혀있는 content 와 style 의 representation 을 푸는 것은 unpairted text style transfer 에서 유효하게 사용되는 기법입니다. 하지만 본 논문에서는 2가지 문제점을 지적하고 있습니다.
문장의 semantic 정보에서 style information 만을 완벽하게 뽑는 것은 여전히 어렵습니다.
RNN 구조의 encoder, decoder 는 long-term dependency 를 잘 잡지 못하는 문제가 있습니다. 이로 인해 content preservation 가 제대로 이루어 지지 않는 문제점이 발생하게 됩니다.
본 논문에서는 Style Transformer 를 제시합니다. transformer 를 사용한 이유는, RNN 보다 훨신 문장을 더욱 잘 latent space 에 표현하고 이는 자연스럽게 style transfer 의 성능 향상과, content preservation에 기여할 것이기 때문이라고 저자는 주장하였습니다.

Abstract (요약) 🕵🏻‍♂️

Disentangling the content and style in the latent space is prevalent in unpaired text style transfer. However, two major issues exist in most of the current neural models. 1) It is difficult to completely strip the style information from the semantics for a sentence. 2) The recurrent neural network (RNN) based encoder and decoder, mediated by the latent representation, cannot well deal with the issue of the long-term dependency, resulting in poor preservation of non-stylistic semantic content. In this paper, we propose the Style Transformer, which makes no assumption about the latent representation of source sentence and equips the power of attention mechanism in Transformer to achieve better style transfer and better content preservation.

이 논문을 읽어서 무엇을 배울 수 있는지 알려주세요! 🤔

transformer 구조를 활용해서 어떻게 위에서 언급한 이슈들을 잘 해결하였는지 확인해야 합니다.
ACL 정도 나온 논문인데, 단순히 transformer 를 적용한 것이 novelty 는 아닐 것입니다. 어떤 method 들이 있었는지 꼼꼼하게 확인해 보아야 할 것 입니다.

레퍼런스의 URL을 알려주세요! 🔗

https://www.aclweb.org/anthology/P19-1601/

codertimo commented 4 years ago

Motivation

transformer 구조가 요즘 잘 되고 있음
content 와 style latent space 의 distangle은 매우 힘든 작업인데, GAN과 같은 방식을 이용하면 그렇게 하지 않도고 해결할 수 있지 않을까 라는 의문으로 시작함.

Method

Transformer 구조에 CicleGAN 을 추가한 구조입니다.

총 3가지 loss 를 사용해서 학습을 진행합니다.

original reconstruction loss : x -> f(x, s) -> y -> f(y, s) -> x
manipulated style reconstruction : x -> f(x, s') -> y' -> f(y', s) -> x
style adversarial : x -> f(x, s') & x-> f(x, s) -> Discriminator (style classification )

Experiment

스크린샷 2020-01-13 오전 12 53 13

Yelp 와 IMDB 데이터셋을 사용하였으며 YELP 에서 SOTA 인 93% 를 달성했습니다.

novelty

단순히 transformer 구조에 style transfer 를 위해 CicleGAN 을 추가하였는데, 이것이 얼마나 큰 novelty 인지는 의문입니다.

codertimo commented 4 years ago

아마 단어 단위의 변화(문장을 그대로 복사하는 대신 sentiment 부분만 변경하는 방식) 은 될 것 같은데 진짜 문장을 바꿀 수 있을까 의문입니다.

codertimo / paper-log