GlobalMaksimum / sadedegel

A General Purpose NLP library for Turkish
http://sadedegel.ai
MIT License
93 stars 15 forks source link

Random Removal Augmentation Test Results #282

Open ertugrul-dmr opened 3 years ago

ertugrul-dmr commented 3 years ago

As proposed in #279, this issue for the results of specific augmentation technique. Results are inspected under several conditions and can be found down below:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Prebuilt Model | Original Result | Augmented (not optimized) | Augmented (optimized) | Optimized Hyper-Parameters | Augmentation Parameters -- | -- | -- | -- | -- | -- Tweet Sentiment Classification | F-1: 0.8537 | F-1: 0.8586 | F-1: 0.8588 | icu , HashVectorizer(n_features=1080000), LogisticRegression(penalty=l2, C=1.226401, intercept=True) | K = 5, α = 0.1 Telco Tweet Sentiment Classification | F-1: 0.6871 Accuracy: 0.6924 | F-1: 0.6888 Accuracy: 0.6944 | F-1: 0.6917 Accuracy: 0.6970 | icu , HashVectorizer(n_features=1070000), LogisticRegression(penalty=l2, C=0.07474, intercept=False) | K = 5, α = 0.1

I observed small increase in both datasets but they weren't not good enough to move further so I decided to test them on smaller dataset where original paper implies better results are seen on smaller datasets. For this purpose I've taken really small sized sample data and tested them with/without augmentations on the original test set to see how augmented data generalizes on unseen data. Here are the results:

Dataset Tweet Original (400 Samples) Tweet Augmented (400 + Augmented) Augmentation Parameters
F-1 Score 0.4894 0.5255 K = 16, α = 0.5

In my trials I can confirm the paper's claims that it works better in smaller datasets (It increased f-1 score by almost 0.04). So in my opinion it's worth to further investigate...