In previous issues 6 different augmentation methods were implemented which were used to augment the imdb dataset available on HuggingFace.co. The next step should be to use some standard model, also available on HuggingFace and fine-tune it with the different augemented datasets, checking which approach works best. Different combinations should be implemented and tuned.
complete_randomizer
keyboard_replacer
mid_randomizer
replacer
remover
misspeller
All of this will be trained and evaluated on their own, augmented eval datasplit as well as on the correct eval datasplit. Also all of them should be evaluated after being trained on a purely augmented train datasplit and the augmented train datasplit concatenated to the correct imdb train datasplit.
[x] only augmentented
[x] augemented with non-augmented eval
[x] augmented + non-augmented test & non-augmented eval
In previous issues 6 different augmentation methods were implemented which were used to augment the imdb dataset available on HuggingFace.co. The next step should be to use some standard model, also available on HuggingFace and fine-tune it with the different augemented datasets, checking which approach works best. Different combinations should be implemented and tuned.