Using Autoencoders in btgym architecture (Question)

JaCoderX commented 6 years ago

I'm new to the project and trying to familiarize myself with the framework. I followed the examples given which been a great help so far.

Looking at the Staked LSTM architecture, my focus is now understanding how to train and use the autoencoders part of the system.

If my understanding is correct we need to pre-train the autoencoder and then use only the encoder part, but how is it done?

are we using btgym framework to pre train the autoencoder? or is it done externally?

would be great if an example can be provided to understand the process, or a guideline on how it should be done

thanks, Jacob

Kismuz commented 6 years ago

@JacobHanouna,

my focus is now understanding how to train and use the autoencoders part of the system.

as family of autoencoders, basically, stands for encode_to_hidden_state-compress_hidden_state-decode_to_original routine, to get meaningful answer, it is essential to understand following:

what you want to encode and how(i.e. pose some structure as with convolution layers or get temporal casual structure via rnn encoder or just reduce dimensionality)?
how you aim to compress hidden encoded state and why (~what you want to achieve by doing so?)
by means of what metric you want to get reconstruction( depends on p.1)?
what aspects of a policy estimator, generally, you intend to obtain or improve by employing autoenc. architecture?

JaCoderX commented 6 years ago

I have a rough idea of what type of RL solution I want to use, but for now i'm just learning how the framework works. Going over the docs, and looking at the BTGym stacked LSTM agent diagram. I see that your external input is going through a conv encoder before it is being passed to the LSTM layers. which seem like a great way to extract useful features from the raw data before it is being passed on.

I am just trying to understand the phase of how you pre-train this layer? (I assume you train it separately and then take the encoded part and 'freeze' it on the main network)

In general, I'm really interested in your RL solution "Stacked LSTM with auxillary b-Variational AutoEncoder". But I try to take one step at a time :)

Kismuz commented 6 years ago

@JacobHanouna ,

I am just trying to understand the phase of how you pre-train this layer? (I assume you train it separately and then take the encoded part and 'freeze' it on the main network)

There is no need to pretrain and freeze t; it trained jointly as a part of entire network; think of it as state going through encoder and you get hidden state to be passed to LSTM and further to get policy output, so you 'encoder hidden state' to conform objective you set; if you replace LSTM part by convolution decoder and set reconstruction objective - you'll end up with a hidden state that's differs from first one in a way it conforms exactly with state reconstruction objective, not optimal policy output. Sometimes it is helpful to mix both objectives, say, feed hidden Z into LSTM (policy optimisation objective) AND into convolution decoder to get reconstruction (if for some reason we believe it can help us). So you end up with combined loss function L = a L_policy_opt + b L_reconstruct. Since our primary objective is policy optimisation, later is often called 'auxillary task'. It can be reconstruction, or depth perception (in 2d/2d env) or something else, depending of nature of domain. This is the idea behind algorithms like 'UNREAL' or 'RAINBOW'. Btw, vanilla A3C loss is itself combination of two objectives.

JaCoderX commented 6 years ago

@Kismuz thank you so much for your detailed reply, it was a great help. I was missing a good understanding of 'auxillary task' concept

found also this article that give a good literature overview on the subject

Kismuz / btgym

Using Autoencoders in btgym architecture (Question) #74