Add a way to randomly shuffle the corpus / data file

It can happen that a corpus contains training instances grouped by class which is very bad for training. In such cases there should be a way to either shuffle the corpus before running the pipeline with the training PR on it, or to shuffle the generated data file before using it (and before splitting of the validation instances).

Doing it inside GATE by providing a meny entry for shuffling on a corpus:

this should be easy to implement using Collections.shuffle(list,random) given that a corpus is a list
more involved to support "unshuffling"
cannot be used in other scenarios (e.g. using runPipeline, GCP) where we may need to shuffle the data file that was created

Shuffling the data file:

shuf on Linux works well, but: no easy way to provide repeatable randomness through a seed, unknown how well it scales beyond available memory
not sure what other scalable, portable ways to shuffle exist, maybe:
- https://github.com/trufanov-nok/shuf-t
- probably better to implement our own python-based approach with two iterations: first index the starting offsets and lengths of all lines into memory, shuffle that list, the seek and write lines in the shuffled order. If file is not too big, just shuffle lines in memory directly.

GateNLP / gateplugin-LearningFramework

Add a way to randomly shuffle the corpus / data file #82