djaszak / NLPAug

A framework to simplify the usage of common Data Augmentation methods in the NLP context.
MIT License
2 stars 0 forks source link

Further investigate promising character augmentation #7

Open djaszak opened 2 years ago

djaszak commented 2 years ago

In #4 general character augmentation techniques were used to investigate the broader range implemented. 3 techniques showed promising results when used solely but no technique was showing good results when being used together with the base dataset. Now the dataset should only be augmented in batches of 20/40/60/80% percent. I also think it is interesting to use this techniques on a second dataset and get some validation. Herefore I want to use the emotion datset that does not have two labels like the IMDB set but six, I think this could be interesting. The "batching" will be done with a probabilistic approach to randomize the words that are being augmented and the ones not. Summary: