Open ctrlnomad opened 4 years ago
Looking back onto the idea of having many different tasks as settings that the environment can implement, let's denote a task T
as being composed of (classes, pct_examples, data_order)
. Thus some tasks would be more difficult than others, e.g. distinguishing between 'small mammals' and 'aquatic mammals' is more difficult than between train
and plane`. The trouble is determining a difficulty metric for each of these tasks is not trivial, some attempts have been made by using the SVM margin as a metric. We also know from the same paper that training on easy examples at first and then increasing the difficulty as training progress increases the convergence rate.
We can then say that a curriculum C
is composed of [T_0, ..., T_k]
tasks. We can then define a task distribution p(T)
(or let it be random) from which we can sample (classes, pct_examples, data_order) ~ p(T)
k times to get a curriculum on which we can train a neural network. However, a sensible curriculum C
also needs to have the ability to schedule tasks at different time steps t
. At every timestep, we will have K
gradient updates, at which point the AutoTrain agent will have to either select one gradient update (discrete) or a linear combination of all K
updates (continuous, not sure how much sense this makes). Thus at the end, we will have a curriculum C
which also has a scheduler determined by the AutoTrain Agent.
In classification problems, we have some reasonable heuristics to follow when it comes to thinking about how hard a task is. For example, the CIFAR-100 dataset is a hierarchical dataset with 100 classes and 20 super-classes, each including 5 member classes. Thus distinguishing between the superclasses is easier than between member classes (of the same superclass). So the tasks could progress from distinguishing from 2 superclasses to finally the entire dataset.
One point to mention here is that with this definition the dynamics of producing a learning rate and other hyperparameter schedulers is not so clear? perhaps we can utilize a second model or a different neural network. Once we figure out how to include hyperparameters scheduling we can also include that into the curriculum.
An interesting motivation behind the final reward would be reproducibility --> train different model architectures, the tighter the phi distribution the higher the reward. Or just the max(scores)
. Can also include timesteps taken to produce C
, the phi
value itself and a comparison to a 'vanilla' baseline training.
reward
better competition environment
Interesting Resources:
Things to think about: