google-research / batch_rl

Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
https://offline-rl.github.io/
Apache License 2.0
536 stars 75 forks source link

Can I train with my own dataset? #9

Closed Marchen0 closed 3 years ago

Marchen0 commented 4 years ago

Thanks for the great work first! I have a bunch of data in (state, action, reward, next state) format. I try to understand how you guys parse the $store$_action_ckpt file in the code but I failed. It would be greatful if you could provide a way to train this model with my own dataset~

Thanks again

agarwl commented 4 years ago

Right, this is done via dopamine replay buffers which basically store the dataset in a specific format. You can try adding your data to a Dopamine buffer. A simpler way would be to create a tensorflow dataset out of your tuples or just sampling data directly:

minibatch ~ Sample(Dataset, batch_size)
model.train(minibatch)

If you have DQN implemented in your codebase, implementing methods from this repository such as REM is a couple of more lines, so you can simply do that!

agarwl commented 4 years ago

@Marchen0 Is your issue resolved? If so, can you please close this issue? If not, can you describe what's the problem you are facing?

fc1315 commented 4 years ago

@agarwl I have the same question regarding using my own offline data, not generated from Atari games. Looks like dopamine replay buffers have a very specific format and I need to change my data structure a lot. Could you give more details on how to implement

model.train(minibatch)

directly? Thanks a lot!

agarwl commented 4 years ago

I think if you have data in the form of (s, a, r, s') tuples, you can create a tf/pytorch dataset (see RL Unplugged, or D4RL) containing these tuples. An even simpler (but less scalable) option is to read the dataset directly into memory using numpy arrays and samples minibatch tuples of (s, a, r, s') directly and pass it to the model for training.

That said, saving it to dopamine replay is also easy as you simply call the _store_transition function in a dopamine agent. So, if you have sequential data, you can simply read that data step by step and save it to disk using the LoggedReplayBuffer using the log_final_buffer function.