GlobalMaksimum / sadedegel

A General Purpose NLP library for Turkish
http://sadedegel.ai
MIT License
93 stars 15 forks source link

Data Augmentation for Text Data #279

Open ertugrul-dmr opened 3 years ago

ertugrul-dmr commented 3 years ago

It's hard to reach labelled text data and it's costly to label the data manually; but usually more data we have, better performance we can achieve. While working on text normalizations we can also consider text augmentations too.

Adding augmented text data might boost our model performances by increasing number of instances to train on. For this we can try several approaches, from simple to more complex ones:

Random Removal:

Synonym Replacement:

Embedding Replacement:

Character Replacement:

Back Translation:

Text Generation: