Word Level Synonyms - Githubissues

Synonym replacement should be the core of my framework, as IMO this is the most interesting as easiest approach. We want to approach 3 different layers:

Synonym Database
Replacement Method
Synonym Selection

For these layers I am choosing 3-4 method proven good in previous work, stated by Table 2 in [1].

Synonym Database

Wordnet and its python API
Thesaurus

Replacement Method

Substitutable words are nouns, verbs, adjectives, or adverbs that are not part of a named entity. Each word is replaced with a certain probability.
Only adverbs/adjectives, sometimes nouns, more rarely verbs
No time words, prepositions and mimetic words
No stop words. n random words are replaced (SR) or synonyms are insterted at random position (RI) The last point is just referncing to 2 techniques defining EDA. All 4 defining EDA should be implemented and used in different behaviours but for this issue, the ones referencing on synonyms should be implemented.
Synonym Selection
Remaining probabilty shared among synonyms based on language model score
Uniform random
Chi-square statistics method (TBD)

[1] Markus Bayer, Marc-André Kaufhold, Christian Reuter (2021) A Survey on Data Augmentation for Text Classification

djaszak / NLPAug

Word Level Synonyms #9

Synonym Database

Replacement Method

Synonym Selection