-
## 一言でいうと
教師なしで翻訳を行う試み。ソース・ターゲットでそれぞれノイズを入れた文を復元するEncoder-Decoderを作成し、翻訳結果(+ノイズ)をターゲットのEncoderで潜在表現にしたものがソースのDecoderで復元できるよう学習する。ソース/ターゲットの潜在空間が近しくなるよう、敵対的学習のlossを加えている
![image](https://user-ima…
-
### Metadata
- Authors: Guillaume Lample, Ludovic Denoyer, Marc'Aurelio Ranzato
- Organization: Facebook AI Research
- Conference: ICLR 2018
- Link: https://openreview.net/forum?id=rkYTTf-AZ
-
A [quick search](https://github.com/google/corpuscrawler/search?q=wikipedia) shows you that CorpusCrawler does not crawl or use Wikipedia. I don't know Python but it seems feasible, either from scratc…
-
Hi :) I've read your paper quite interestingly, and I would like to try running the create_training_corpora_monolingual.sh, but it seems like number_of_lines_in_corpus or path_to_corpus is missing. Co…
-
I have no idea how to fix this, any help or at least guidance is appreciated.
And here is my current log for a new job.
It seems to be trying to `Collecting translated mono src dataset` before tra…
-
Разобраться с кодом одной из реализации статьи.
1. https://github.com/IlyaGusev/UNMT
2. https://github.com/sobamchan/unsupervised-machine-translation-using-monolingual-corpora-only-pytorch
3. htt…
-
In the short term we are focusing on building up our language list by training easy to segment LTR languages, as they don't require changes to the training pipeline, and are immediately supported in F…
-
Back translation involves the use of monolingual data to generate more training data for MT task. A backward intermediate model is trained on the available corpora and then used to generate synthetic …
-
### Research
* J. Tiedemann, 2016, Finding Alternative Translations in a Large Corpus of Movie Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LRE…
-
Hi,
I am trying to mine some parallel sentences from two large monolingual corpora (over 40M sentences each). In the first step I encoded the two sides and then called `mine_bitexts.py` to do the mag…