Use character augmentation to augment real test data

djaszak commented 2 years ago

The augmenter should only be a framework to allow easy augmentation of real test data and comparing different approaches with respective accuracy. In Y. Belinkov and Y. Bisk, “Synthetic and natural noise both break neural machine translation,” in ICLR 2018, 2018, p. Different approaches are described and used. In #2 I implemented the approaches and now they should be used in similar manners as described. I do not want to accurately replicate what was done in the paper but I want to try to prove and use the flexibility of my framework to find new results.

djaszak commented 2 years ago

What was done/ What are my tasks:
In general the paper tried to test translations with noisy data. This turned out to always break the NMT (Neural Machine Translation) models and to avoid this, models were trained with noisy data and then tested again.
Experiments with three different systems:

fully character-level model of Jason Lee, Kyunghyun Cho, and Thomas Hofmann. Fully Character-Level Neural Machine Trans- lation without Explicit Segmentation. Transactions of the Association for Computational Linguis- tics (TACL), 2017.
Nematus squence-to-sequence
*CharCNN As a Dataset M. Cettolo, C. Girardi, and M. Federico. 2012. WIT3: Web Inventory of Transcribed and Translated Talks. In Proc. of EAMT, pp. 261-268, Trento, Italy. pdf, bib. will be used as it delivers a complete corpus of relevant data from ted talks with very good translations

djaszak commented 2 years ago

After using different augmenting methods on character level I got some interesting data that should further be investigated in a follow up issue. trainings_over_time_v1

djaszak / NLPAug

Use character augmentation to augment real test data #4