Adversarial auxiliary signals

Create two separate networks that compete to explore the environment (together they form 1 agent) Idea is to have a reinforcement learning setup where:

The prediction network learns an unsupervised representation of the environment, and predicts what will happen next
- We could use adversarial techniques for unsupervised learning, or we could use something less fancy like denoising autoencoders
The exploration network controls the actions of the agent, and gets a reward proportional to the MSE of the prediction network's prediction and reality
- This is an artificial reward signal, not tied to the true environment reward

The exploration network has no backprop into the weights of the prediction network, so it can't suggest degenerate representations (e.g. learning to output random noise to maximize surprise).

Influence is solely through the actions of the exploration network causing mispredictions. e.g. reality is always in between the exploration network and the prediction network

Considerations:

The exploration network needs to quickly adapt to changing dynamics (model this like a multi-arm bandit that periodically changes the payout probabilities of the arms). Things like RL^2 are probably a good idea here.
The inputs to the exploration network might need to be the raw input, and maybe some memory like an LSTM

AI-ON / Multitask-and-Transfer-Learning

Adversarial auxiliary signals #13