I have another idea that could make the evaluation project very attractive and visible – you probably have seen that some ideas from differential privacy were suggested to provide better use of test data, e.g., https://papers.nips.cc/paper/5993-generalization-in-adaptive-data-analysis-and-holdout-reuse.pdf ; I think that it is possible to implement a simple version of it in the eval framework. Could be very interesting.
Quote from Dan