IBM / mi-prometheus

Enabling reproducible Machine Learning research
http://mi-prometheus.rtfd.io/
Apache License 2.0
42 stars 18 forks source link

[Feature request] Support for several test configs in tester #91

Closed vmarois closed 5 years ago

vmarois commented 5 years ago

I am encountering a case where I would like to test one unique trained model on 2 different sets (Condition A & Condition B of CLEVR-CoGenT). Currently, there's no easy way to do it. We can indicate one test configuration in the initial training configuration which will then be used by default when we'll use the tester with its flag --m (as it looks for a config file in the same directory), or we can specify a config file with --c, which overrides the default one.

Idea:

The yaml library which we are using to load data from the config file respects Python types. i.e., if you specify a value as a list, e.g. set: ['valA', 'valB'], then it will be loaded as a type list.

We can use that to split the testing section into several ones, triggered if any key's value is of type list of length >= 2. I am specifying any key as it would allow to only specify a list of values for the relevant key(s), without having to dedouble every one of them. e.g. for CoGenT, we would only have to specify `set: ['valA', 'valB'] to create 2 testing configs, one for valA and one for valB where only set is different.

The Tester would then deal with 2 or more configs and thus iterates as many times as needed to run the multiple tests. I don't know yet which is the cleanest way to implement this iteration (meaning as a separate tester or not).

One problem is that this overloads the usage of the list type in the testing section, e.g. What if one key's value is already a list?

One solution could be to control this by an additional key in the testing section, e.g. if multi_tests: True is present, then it will look for a list of length >=2 to use. The length of that lst would represent how many tests to run.

But this solution only partially solve this problem..

cc @tkornut, @sesevgen Any ideas?