Some benchmarking datasets are too easy.

In our collection of datasets that we use for our benchmarking process, we find some datasets that do not represent a hard problem and exhibit the following properties when solving them:

all tuners reach the same result;
the result is too perfect;
or there is no significant improvement as a result of tuning;

We still keep these datasets in the our benchmarking process and do test them every release. These datasets essentially provide us a testing possibility. That is in future we will incorporate a test that looks from changes or deviations from these expected results and if we see significant difference, we will have to examine what changed in the release. These anomalies, which can occur both in our tuners and in the tuners of an external library, help us detect if there have been any changes to the code or to an external library used by the tuners.

We also use them to know if we have done an integration of another library correctly, as we are expecting the same amount of ties with the others on 100 iterations.

We will formalize this process in the near future.

MLBazaar / BTB

Some benchmarking datasets are too easy. #199