JuliaReinforcementLearning / ReinforcementLearning.jl

A reinforcement learning package for Julia
https://juliareinforcementlearning.org
Other
584 stars 110 forks source link

Recurrent Models #144

Closed lorrp1 closed 1 year ago

lorrp1 commented 3 years ago

Does it this repo support recurrent models (LSTM for example)?

findmyway commented 3 years ago

No, not yet.

@RajGhugare19 may have started working on something related: https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/issues/109

It's not very difficult to support recurrent models. But currently I'm busy working on https://github.com/JuliaReinforcementLearning/DistributedReinforcementLearning.jl , so contributions are welcomed!

RajGhugare19 commented 3 years ago

@findmyway Sorry I was a bit busy last two weeks. I am trying to add the support for recurrent models but I need some guidance on how to start. Can you point towards which files I shall make changes to and some idea of how you want it to be? Thank you :)

findmyway commented 3 years ago

No hurry 😃

We can split the implementation into the following steps.

1. Implement a recurrent trajectory

All the trajectories we have are transition based. We need to create a new sequence based trajectory.

Quoted from the paper Deep Recurrent Q-Learning for Partially Observable MDPs:

Bootstrapped Sequential Updates: Episodes are se-lected randomly from the replay memory and updates be-gin at the beginning of the episode and proceed forwardthrough time to the conclusion of the episode. The targetsat each timestep are generated from the target Q-network,ˆQ. The RNN’s hidden state is carried forward throughoutthe episode.

Bootstrapped Random Updates: Episodes are selectedrandomly from the replay memory and updates begin at ran-dom points in the episode and proceed for onlyunroll itera-tionstimesteps (e.g. one backward call). The targets at eachtimestep are generated from the target Q-network,ˆQ. TheRNN’s initial state is zeroed at the start of the update.

To support these two kinds of updates, the main change is to enable selecting episodes randomly. This means we may need to store the start of each episode in an extra field.

Note that for R2D2, there are two other kinds of variants.

2. Implement a recurrent version of DQN

First, the model needs to be changed. In the update! function, we have to select an episode from the trajectory and then send it to the policy and apply the updates above. You may need to get yourself familiar with the implementation of LSTM in Flux and understand the concepts of initial state, hidden state, observations, etc.

3. Implement a generalized recurrent wrapper

I'm not sure how far we'll get in this direction. But ideally, we need to have a general solution.

4. Implement the R2D2 or even R2D3

Once we have the recurrent version, it's pretty easy to apply it to the distributed version. Some main differences are experience weight/sequence sampling/sampling and inserting ratio.

RajGhugare19 commented 3 years ago

@findmyway Thank you! I'll soon update with my progress :)

RajGhugare19 commented 3 years ago

If t is a CircularCompactSARTSATrajectory, then t[:terminal] will be a 1-d CircularArrayBuffer of type Bool and length Capacity. Let's say we have added a field named episode_start, an ElasticArray of length 0 currently .So at every "post episode stage", before updating, if we store the indices(indices+.1 actually) of True values in t[:terminal] in "episode_start" field, we'd be able to use it to randomly select episodes from the trajectory in update!() . Is this aproach correct or am I missing something?

findmyway commented 3 years ago

That's good idea. However when the buffer is fully filled, we have to maintain the starting index of each episode: decreasing each one and ejecting elements with a negative value.

A better solution is to treat each episode as an element in the circular buffer. At the start of each episode, we insert an empty element(it can be a VectComactSARTSATrajectory or ElasticCompactSARTSATrajectory) into the buffer.

bileamScheuvens commented 2 years ago

Has there been progress on this issue or is there a workaround to hackily use recurrence with NeuralNetworkApproximator?

findmyway commented 2 years ago

Still no progress yet. We also need to change the trajectory part.

OliEfr commented 1 year ago

Hey all!

I am also looking to use a RNN within my RL-Learning Pipeline in Julia. Are there some updates regarding this yet?

If not, are there any known workarounds? (E.g. python integration,...)

OliEfr