Add support for paired statistical tests

The current Python code for statistical testing uses the randtest library for approximate randomization, which assumes groups are independent.

However, our experimental units (topics) are paired (i.e., we run all experiments on all of them) and therefore there is a dependence among runs.

This commit reimplements statistical testing using the popular R coin library on top of the existing code for parsing trec_eval and sampleval files.