Unsupervised Text Attribute Transfer via Iterative Matching and Translation

어떤 내용의 논문인가요? 👋

text style transfer 는 parallel data 의 부족으로 인한 문제가 있습니다.
이를 해결 하고자 텍스트 내에서 content 와 (style) attribute 정보를 분리하려는 시도들이 있지만, 여전히 content 를 제대로 유지하지 못하는 문제와, 문법 맞지 않는 문장을 생성하는 문제가 존재합니다.
본 논문에서는 Iterative Matching and Translation (IMaT) 라는 새로운 방법을 제시하였습니다 1) source, target corpus 에서 문장 유사도를 바탕으로 parallel corpus(source -> target) 를 만들었습니다. 2) 일반적인 seq2seq 을 이용해서, attribute transfer 를 학습하였습니다. 3) parallel corpus 만드는 과정을 계속 발전시키면서, transfer function 의 부족한 부분들을 지속적으로 학습해 나갑니다.
이 방식을 통해서 sentiment, formality 테스크에서 큰 폭으로 SOTA를 달성하였습니다.
추가적인 contribution 으로 사람이 직접 만든 transfer learning 용 테스트셋을 공개하였습니다.

Abstract (요약) 🕵🏻‍♂️

Text attribute transfer aims to automatically rewrite sentences such that they possess certain linguistic attributes, while simultaneously preserving their semantic content. This task remains challenging due to a lack of supervised parallel data. Existing approaches try to explicitly disentangle content and attribute information, but this is difficult and often results in poor content-preservation and ungrammaticality. In contrast, we propose a simpler approach, Iterative Matching and Translation (IMaT), which: (1) constructs a pseudo-parallel corpus by aligning a subset of semantically similar sentences from the source and the target corpora; (2) applies a standard sequence-to-sequence model to learn the attribute transfer; (3) iteratively improves the learned transfer function by refining imperfections in the alignment. In sentiment modification and formality transfer tasks, our method outperforms complex state-of-the-art systems by a large margin. As an auxiliary contribution, we produce a publicly-available test set with human-generated transfer references.

이 논문을 읽어서 무엇을 배울 수 있는지 알려주세요! 🤔

유사한 문장을 찾아서, 이를 parallel 데이터로 만들어 사용하는 방식을 알아볼 수 있습니다.
모델적 접근이 아닌, 데이터의 augmentation 를 통해서 성능을 올리는 방식을 볼 수 있습니다.

레퍼런스의 URL을 알려주세요! 🔗

https://arxiv.org/abs/1901.11333

codertimo / paper-log