Open freeman-lab opened 8 years ago
Sounds good, but maybe do not make it possible to obtain results on the test datasets, otherwise people can easily overfit them (and we won't know).
I have been using "neurofinder evaluate a.json b.json" on the training datasets, just to get an overall idea of how many ROIs to output.
Yes, oops, I definitely meant only having this for the training data 😄
Currently the
evaluate
method compares two local results to each another, which is useful. But as suggested by @marius10p, sometimes we want the evaluation to incorporate metadata from the "standard" ground truth datasets.So one idea is to add an extra method, maybe called
benchmark
orevaluate-remote
that takes as input ONE set of results, and the name of a ground truth dataset, then fetches both the remote regions and the metadata, and returns the scores.In other words, we'll have both
and
Thoughts?
cc @syncrostone