Add a sub-package (textacy.augmentation) for basic text data augmentation; implements several transformations suitable for use in text classification tasks, with a higher-level function textacy.augmentation.apply() to call them
random synonym replacement
random synonym insertion
random item swapping
random item deletion
random sentence shuffling
Note: This code is provisional, and the API will almost definitely be changing.
Motivation and Context
I've been training spaCy TextCategorizer s on datasets that are too small, and data augmentation is a great way to improve model performance.
How Has This Been Tested?
Lots of manual validation and trial-by-error. Wrote some tests, and they pass (mostly...).
Types of changes
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[x] My code follows the code style of this project.
[TODO] My change requires a change to the documentation, and I have updated it accordingly.
Description
textacy.augmentation
) for basic text data augmentation; implements several transformations suitable for use in text classification tasks, with a higher-level functiontextacy.augmentation.apply()
to call themMotivation and Context
I've been training spaCy
TextCategorizer
s on datasets that are too small, and data augmentation is a great way to improve model performance.How Has This Been Tested?
Lots of manual validation and trial-by-error. Wrote some tests, and they pass (mostly...).
Types of changes
Checklist: