aravindr93 / hand_dapg

Repository to accompany RSS 2018 paper on dexterous hand manipulation
Apache License 2.0
219 stars 30 forks source link

demonstration file #9

Open rasoolfa opened 4 years ago

rasoolfa commented 4 years ago

Hi @aravindr93 ,

Thanks for releasing codes and demonstrations files for this work.

I got two questions about demonstration files (e.g. hammer-v0_demos.pickle). I might be missing something here, but how does DDPGfD use demonstrations as these files only contain (s, a, r) not (s,s', a, r)(s' indicates next state)? And can you please provide more details about the structure of those files so it would be easier to compare and reproduce your paper results?

Thanks.

rasoolfa commented 4 years ago

I guess one way to utilize demonstrations is to collect samples by using actions in those files.

bennevans commented 4 years ago

Since the observations/states are in a time-ordered list you can get s' by taking the state at the next index in the list.

bennevans commented 4 years ago

As for the structure of a file, it is a list of dictionaries each representing a trajectory. They have keys

['actions', 'observations', 'rewards', 'init_state_dict']

Which correspond to the actions, states, and rewards across the time of the trajectory + initial state information