furukawa-ai / deeplearning_papers

AI論文読みメモ
26 stars 0 forks source link

UNSUPERVISED NEURAL MACHINE TRANSLATION #86

Closed msrks closed 1 year ago

msrks commented 6 years ago

unsupervised NMTモデルその1(2017-10-30にarxivに投稿、ICLR2018狙い)

https://arxiv.org/abs/1710.11041

In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs. There have been several proposals to alleviate this issue with, for instance, triangulation and semi-supervised learning techniques, but they still require a strong cross-lingual signal. In this work, we completely remove the need of parallel data and propose a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora. Our model builds upon the recent work on unsupervised embedding mappings, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingual corpora alone using a combination of denoising and back- translation. Despite the simplicity of the approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014 French → English and German → English translation. The model can also profit from small parallel corpora, and attains 21.81 and 15.24 points when combined with 100,000 parallel sentences, respec- tively. Our approach is a breakthrough in unsupervised NMT, and opens exciting opportunities for future research.

msrks commented 6 years ago

いつもどおりEncoderとDecoderからなるSeq2SeqベースのNMT。

Unsupervised = TranslationのInput==Output対応のSentenceを用意しなくていい

EncoderはCross-Linugalな言語モデルのデータセットを使って学習。

Decoderの学習が面白い。以下の2プロセスを交互に繰り返して学習する。

  1. denoising
  2. back-translation

1. Denoising

Noiseを加えた(誤らせたSentenceや、順番を入れ替えたSentence)をEncoderに入れて、Encodeする。 Encoded Vectorで元の Sentenceを自己再生するようにさせる。

2. back-translation

Language1 --(Shared Enc)--> (Lang2 Dec) --> Language2 --> --> (Shared Enc) --> (Lang1 Dec) --> Language1

と戻してきて、Resonstruction Lossを減らすように学習する。

2017-11-09 12 45 17