Add sorting of chunks to evaluation

Description

This change introduces sorting of chunks before executing evaluation to reduce padding to minimum and in this way improve performance.

As every input feature has unique qas_id it can be used for sorting. With the sorting evaluation function goes like this:

Step number 1 is performed so that chunks and their inference results can be easily put in proper order in step number 5 for evaluation in step 6.

Results for max_seq_length=128, doc_stride=32: no sort:

sorted:

Performance did not improve much due to most of the chunks being of same 128 length due to relatively small values of max_seq_length and doc_stride.

Results for max_seq_length=512, doc_stride=128 (default values in run_squad.py script): no sort:

sorted:

As you can see the performance improved significantly (~20%) without any loss of accuracy.

cc @dmlc/gluon-nlp-team