Open rasoolfa opened 4 years ago
I guess one way to utilize demonstrations is to collect samples by using actions in those files.
Since the observations/states are in a time-ordered list you can get s' by taking the state at the next index in the list.
As for the structure of a file, it is a list of dictionaries each representing a trajectory. They have keys
['actions', 'observations', 'rewards', 'init_state_dict']
Which correspond to the actions, states, and rewards across the time of the trajectory + initial state information
Hi @aravindr93 ,
Thanks for releasing codes and demonstrations files for this work.
I got two questions about demonstration files (e.g. hammer-v0_demos.pickle). I might be missing something here, but how does DDPGfD use demonstrations as these files only contain (s, a, r) not (s,s', a, r)(s' indicates next state)? And can you please provide more details about the structure of those files so it would be easier to compare and reproduce your paper results?
Thanks.