broadinstitute / AutoTrain

Using RL to solve overfitting in neural networks
0 stars 0 forks source link

Advanced Prototype - AutoTrain #6

Open ctrlnomad opened 3 years ago

ctrlnomad commented 3 years ago

Interesting Resources:

Things to think about:

ctrlnomad commented 3 years ago

Design Proposal

Discovering a Curriculum Through a Reinforcement Learning Agent

Looking back onto the idea of having many different tasks as settings that the environment can implement, let's denote a task T as being composed of (classes, pct_examples, data_order). Thus some tasks would be more difficult than others, e.g. distinguishing between 'small mammals' and 'aquatic mammals' is more difficult than between train and plane`. The trouble is determining a difficulty metric for each of these tasks is not trivial, some attempts have been made by using the SVM margin as a metric. We also know from the same paper that training on easy examples at first and then increasing the difficulty as training progress increases the convergence rate.

We can then say that a curriculum C is composed of [T_0, ..., T_k] tasks. We can then define a task distribution p(T) (or let it be random) from which we can sample (classes, pct_examples, data_order) ~ p(T) k times to get a curriculum on which we can train a neural network. However, a sensible curriculum C also needs to have the ability to schedule tasks at different time steps t. At every timestep, we will have K gradient updates, at which point the AutoTrain agent will have to either select one gradient update (discrete) or a linear combination of all K updates (continuous, not sure how much sense this makes). Thus at the end, we will have a curriculum C which also has a scheduler determined by the AutoTrain Agent.

In classification problems, we have some reasonable heuristics to follow when it comes to thinking about how hard a task is. For example, the CIFAR-100 dataset is a hierarchical dataset with 100 classes and 20 super-classes, each including 5 member classes. Thus distinguishing between the superclasses is easier than between member classes (of the same superclass). So the tasks could progress from distinguishing from 2 superclasses to finally the entire dataset.

One point to mention here is that with this definition the dynamics of producing a learning rate and other hyperparameter schedulers is not so clear? perhaps we can utilize a second model or a different neural network. Once we figure out how to include hyperparameters scheduling we can also include that into the curriculum.

An interesting motivation behind the final reward would be reproducibility --> train different model architectures, the tighter the phi distribution the higher the reward. Or just the max(scores). Can also include timesteps taken to produce C, the phi value itself and a comparison to a 'vanilla' baseline training.

ctrlnomad commented 3 years ago

Current State Of The Project