astooke / rlpyt

Reinforcement Learning in PyTorch
MIT License
2.22k stars 323 forks source link

Support for weird graph-based observation data type #88

Open tarungog opened 4 years ago

tarungog commented 4 years ago

Hello, I would like to use your framework in my research for its multithreading features but I have a bit of a weird MDP. the state is a pytorch_geometric graph-structured data object and a variable length array. The action space is also a variable length array. Is there an easy way I can make this work with your framework?

tarungog commented 4 years ago

tl;dr will this library work with arbitrary observation data types?

astooke commented 4 years ago

Hi, interesting question! One challenge to this is that memory is pre-allocated for the observations and actions, according to the sampler batch size. So it can't have variable-sized observations or actions, directly. But if you can specify a maximum length ahead of time, and deal with having trailing zeros (or whatever non-value, maybe even NaN), then that could work.

Hmm yes this is an interesting case, to support graph neural networks, which take in variable-length observations...

Let us know what you try?

tarungog commented 4 years ago

@astooke where is this memory pre-allocated and where could I modify it? could you give me some code pointers

tarungog commented 4 years ago

Also are there better visualization tools that will work with this data format than viskit? Unfortunately this viskit software is quite nascent and underwhelming

astooke commented 4 years ago

where is this memory pre-allocated and where could I modify it?

Sure! Here it is in the serial sampler: https://github.com/astooke/rlpyt/blob/75e96cda433626868fd2a30058be67b99bbad810/rlpyt/samplers/serial/sampler.py#L36
Otherwise look for build_samples_buffer() and see inside that.

Also are there better visualization tools that will work with this data format than viskit?

Good question, want to move it to its own issue thread so others might see it and comment? I use viskit and find it pretty good for separating hyperparameters, but there is a pending pull request for tensorboard (minor changes only, should be merged soon). The format is: data from each experiment recorded in a CSV which sits in its own folder, along with a configuration json, which includes a run_ID for multiple runs launched with the same hyperparameters.

astooke commented 4 years ago

@tarungog Hi! Curious if you pursued anything for variable-sized observations and actions?