Research maybe language/linguistic techniques that could help to generate more data

data augmentations techniques are important for this task. Right now we have:

1) variation through variable names 2) variation through names, first, last, cities, streets, etc 3) perg 4) choiceg 5) variation through random choice of numerical values

to get an intuition the number of examples per class for imagenet: http://image-net.org/about-stats

seems that they all are around above 10K for one class. So maybe it would be nice if the framework could somehow aid the user to have at least that many example per class?

Are there other linguistic ways of changing sentences (maybe syntax) that keeps the same meaning but does alter the way the sentence looks?

For example, for augmentation of images its easy, rotations already provide useful way to do augment data sets easily.

brando90 / MathNet-large-scale-Mathematics-Dataset-for-Machine-Learning

Research maybe language/linguistic techniques that could help to generate more data #30