djaszak / NLPAug

A framework to simplify the usage of common Data Augmentation methods in the NLP context.
MIT License
2 stars 0 forks source link

Use character augmentation to augment real test data #4

Closed djaszak closed 2 years ago

djaszak commented 2 years ago

The augmenter should only be a framework to allow easy augmentation of real test data and comparing different approaches with respective accuracy. In Y. Belinkov and Y. Bisk, “Synthetic and natural noise both break neural machine translation,” in ICLR 2018, 2018, p. Different approaches are described and used. In #2 I implemented the approaches and now they should be used in similar manners as described. I do not want to accurately replicate what was done in the paper but I want to try to prove and use the flexibility of my framework to find new results.

djaszak commented 2 years ago

What was done/ What are my tasks:
In general the paper tried to test translations with noisy data. This turned out to always break the NMT (Neural Machine Translation) models and to avoid this, models were trained with noisy data and then tested again.
Experiments with three different systems:

djaszak commented 2 years ago

After using different augmenting methods on character level I got some interesting data that should further be investigated in a follow up issue. trainings_over_time_v1