In #4 general character augmentation techniques were used to investigate the broader range implemented. 3 techniques showed promising results when used solely but no technique was showing good results when being used together with the base dataset. Now the dataset should only be augmented in batches of 20/40/60/80% percent. I also think it is interesting to use this techniques on a second dataset and get some validation. Herefore I want to use the emotion datset that does not have two labels like the IMDB set but six, I think this could be interesting. The "batching" will be done with a probabilistic approach to randomize the words that are being augmented and the ones not.
Summary:
Use middle_randomizer, inserter, misspeller to augment 20/40/60/80% of IMDB dataset
Use middle_randomizer, inserter, misspeller to augment 20/40/60/80% of emotion dataset
In #4 general character augmentation techniques were used to investigate the broader range implemented. 3 techniques showed promising results when used solely but no technique was showing good results when being used together with the base dataset. Now the dataset should only be augmented in batches of 20/40/60/80% percent. I also think it is interesting to use this techniques on a second dataset and get some validation. Herefore I want to use the emotion datset that does not have two labels like the IMDB set but six, I think this could be interesting. The "batching" will be done with a probabilistic approach to randomize the words that are being augmented and the ones not. Summary:
middle_randomizer
,inserter
,misspeller
to augment 20/40/60/80% of IMDB datasetmiddle_randomizer
,inserter
,misspeller
to augment 20/40/60/80% of emotion dataset