Closed vendetta1987 closed 5 years ago
Great, thanks :) I will take some time to look into this most likely tomorrow.
I've had a look at it and spent some time with it. Basically, the current way of doing it:
...is causing some strange circumstances. I think a better way to do this is to immediately convert everything into binary classes and do correlation just once like that, and then return the label and value for the strongest correlation in the undesired direction. Basically to say "when this particular value of this particular hyperparameter is present, there tends to be correlation towards lower (if accuracy and so on) or higher (if loss, mae, etc). That makes more sense.
I should be able to get back to this tomorrow.
This is basically now fixed, and done in a way which now supports easily adding any custom optimization/reduction strategies. reducers/correlation.py
will be the example, and there are more details in the new docs which are available in the same commit (later today I think). Finally, there will be several new reduction strategies building on the same principles as correlation
.
The idea have been, and yesterday and today I have worked to make it actuality, that you can create drop-in strategies that take as input the 2d experiment log, and then apply some method to decide what should be done. The parameter space has several accepted methods to manipulate the remaining space:
remove_is_not(label, value)
# remove anything that does not match exactlyremove_is(label, value)
# remove anything that exactly matchremove_ge(label, value)
# remove anything that is greater or equalremove_le(label, value)
# remove anything that is less or equalremove_lambda(function)
# remove based on a lambda function This means that as long as your strategy accepts 2d data as input, you are free to do whatever you want in between the input, and output using one of the above utilities to manipulate the parameter space. Moreover, there will handy utilities in reducers/reduce_utils.py
to manipulate the experiment log into meaningful formats with single line commands, for example:
from .reduce_utils cols_to_multilabel
...will give access to cols_to_multilabel
for converting the experiment log into multi_label (i.e. binary) columns, and provides a column format that contains the value, label and dtype in a way that is easy to pass as input to for example remove_is
.
I don't think this level of flexibility have been provided for the researcher in any similar tool in the past :)
This is now available on all branches. I spent roughly a half day with hands-on testing of just the reduction part and looks good. Closing here.
After updating to the daily-dev branch recently (see #297) I noticed my parameter space doesn't get reduced during the experiment. Debugging the code I found some problems with the new version of
reducers/correlation.py
. Even though the development is not finished I wanted to point them out in hopes you'll revisit these parts.L29 is meant to remove the actual reduction metric chosen from the correlation. It works OK but the same functionality breaks later on in L54 removing the first actual result instead of the metric entry. I'd propose to use
corr = corr.drop(self.reduction_metric).apply(abs)
instead ofcorr = corr.apply(abs)[1:]
My model tends to run into OOM problems so I provide a mocked up history object to Talos in that case. While doing so makes sure the optimization continues it also creates some NaN values in the correlations. Those could be checked in L33 like
if pd.isna(corr.values[0]) or (corr.values[0] <= self.reduction_threshold is self.minimize_loss):
I tried wrapping my head around the second correlation run in the method but can't seem to fix it easily. L61 should probably not overwrite the found label. Also the return statement in L68 seems to provide the values in reverse compared to L64 in
reduce_run.py
.