autonomio / talos

Hyperparameter Experiments with TensorFlow and Keras
https://autonom.io
MIT License
1.62k stars 268 forks source link

Experiments, Limits and Reducers (taxonomy / architecture discussion) #256

Closed mikkokotila closed 3 years ago

mikkokotila commented 5 years ago

Related and in part initiated by #223.

The highest level, we have:

First, let's define experiment as what we setup in Scan(), limits as various procedures that are used for limiting the extent of the experiment by some input, and reducers as various procedurers

LIMITERS:

Am I missing something here?

REDUCERS:

A reducer would be anything that can be applied within a preset window into the experiment, where the intention is to identify poorly performing parameters, which then are used to reduce the remaining permutation grid.

The three levels - experiment (level-1), limits (level-2), and reducers (level-3) - require a basis on which they can communicate i.e. the parameter grid. The experiment sets the boundary, limit is used to limit the experiment to a subset of the original boundary, and reducer is used to keep further limiting the boundary in an arbitrary manner throughout the experiment.

Thoughts/questions?

JohanMollevik commented 5 years ago

My initial thought on what could go into each class would e as follows

experiment: defines all possible premutations that could be tested limit: constraints on what actually will be tested, can be computed before any iteration is run reducers: further reduction in parameter space performed as the experiment is running

Are my view compatible with your thoughts?

mikkokotila commented 5 years ago

Roughly yes.

To further build on this and clarify, the experiment gives us a codified version of the permutation space, for example:

    p = {'batch_size': [20, 30, 40, 50],
         'first_neuron': [10, 20, 30]}

...where p is the experiment (level-1) representation of permutation space. As I see it, from here we use the most efficient possible means to create all the possible permutations thereof. That might be the approach you had created.

Then we might immediately limit it by using one of the limiters (repetition from above):

1) limit by sample (e.g. no more than 1% of all) 2) limit by round (e.g. no more than 100 permutations) 3) limit by time (e.g. start the last permutation at 8 am) 4) limit by metric (e.g. stop when 'f1_score > 0.9)

Here we can see that we require three kinds of access through a simple API to the "parameter space" (I prefer using this term over "permutation grid" as it will make some think ah ok, Talos is just grid search). One more language related point here is that inside "parameter space" we have "parameter permutations" or simply "permutations".

So in other words, we have ParamsSpace class which is basically a self-contained solution that does all this completely stand-alone from everything else in Talos, and provides an end-point for:

a1) accept as input level-1 object (params dictionary) r1) return as output level-2 object (parameter space)

The object that is returned in r1 contains the end-points related to level-2 operations:

a2) accept as input a 1-dimensional array/list
r2) return as output parameter space with parameter permutations corresponding to index values

a3) accepts as input an integer r3) returns as output parameter space limited to a corresponding number of parameter permutations

a4) accepts as input a timestamp r4) returns as output a parameter space where the object contains attribute "stop_time"

a5) accepts as input a boolean statement / lambda function r5) returns as output a parameter space limited to a corresponding set of parameter permutations

a6) accepts as input a metric and float value r6) returns as output a parameter space where the object contains attribute "metric_name" and "metric_value"

Note that the parameter space object always has the attributes (e.g. stop_time) but they are None. Making the whole process self-contained in such a way will notably streamline the way in which the currently most confusing, and arguably the most important, aspect of Talos is handled. This way it will be straightforward to think about the process, to make changes, and to add new related capabilities.

Finally, level-3 changes will be 100% reliant on a5/r5 end-point. So as long as the above stands, then there is no further consideration to anything else.

What do you think?

JohanMollevik commented 5 years ago

I must addmitt that I did not realy follow that explanation. Is this looking at it from the conceptual model that the user shall have to concern themself with or from the inner workings of the code that implementors have to concern themself with?

mikkokotila commented 5 years ago

It is concerned with the abstraction of the parameter handling, which is more or less the heart of the procedure Talos is concerned with (i.e finding a suitable model hyperparameter configuration). The above outline explains this in terms that is required for understanding the end-to-end function, with an effort to avoid any ambiguity but still at very high level.

I must addmitt that I did not realy follow that explanation.

Which part of it is unclear?

mikkokotila commented 5 years ago

Some possibly useful clarifications...

Right now the way parameters is handled, is not a discrete self-contained class/module, but in the above proposal it becomes so.

Right now the way various rules related with the parameters (e.g. check if number of parameters is checked) is handled outside of the parameters themselves, but in the above proposals everything related with the handling of the parameter space is contained within the same object. This object could be then stored in self.paramater_space_object or similar.

Right now it's not architecturally clear what is being abstracted and where, the above provides a narrative that can be used to make sure that a person contributing code for example has a meaningful entry-point to do that. I understood from the implements you had made, that at the moment there is no precise language / taxonomy which can be used to discuss these things.

JohanMollevik commented 5 years ago

Just to be clear How does these

limit by time (e.g. start the last permutation at 8 am) limit by metric (e.g. stop when 'f1_score > 0.9)

fit into the parameter space, to me it feels that these are not operations on parameter space but rather parameters to the scan that is exploring parameter space.

mikkokotila commented 5 years ago

Just to be clear How does these

limit by time (e.g. start the last permutation at 8 am) limit by metric (e.g. stop when 'f1_score > 0.9)

fit into the parameter space, to me it feels that these are not operations on parameter space but rather parameters to the scan that is exploring parameter space.

I thought so as well. I struggled a bit when I thought about it, but ended up with the conclusion that it would still be meaningful to have everything in one self-contained object. In the Talos context it seems that both "limit by time" and "limit by metric" are associated with the parameter space object more than anything else. Also to have all limits in one object will allow just a single line of code in Scan() which is ideal.

JohanMollevik commented 5 years ago

The disadvantage of that and why I think it makes the conceptual model less clear is that this is not something that can be resolved before the scan starts. This makes the ParameterSpace object serve more than one purpose (a warning sign i design according to me) one of representing the parameter space and one of passing some unrelated paramters to the scan.

If you want to pack the parameters to scan, have you considered making a scan parameters object?

mikkokotila commented 5 years ago

I agree with what you say, and thence the ParamSpace object has to be able to only give two kinds of outputs: parameters for a single permutation in actual format (dictionary with labels) or False. As long as limits are not met it will not be False. Once any of the limits is met, or index is exhausted, False will be returned. Basically it is a kind of a pseudo generator.

JohanMollevik commented 5 years ago

Ahh, ok if it is designed as a generator I get where you are comming from. In that case it makes a bit more sense, but I'm still not convinced, will this not require that the ParamSpace object have access to the current state of the search to be able to evaluate metrics etc? Isn't that a bit of an awkward dependency?