cheryyunl / Make-An-Agent

MIT License
30 stars 0 forks source link

Contents of the training dataset file data.pt #3

Closed AmoghJuloori closed 2 months ago

AmoghJuloori commented 2 months ago

Hi, Thanks for this excellent work. I am trying to understand and use this framework on another dataset but I have few doubts regarding the training data being used i.e. data.pt that is used to train the autoencoder and the behavior embedding. Seems like there are three types of data in the file with keys: 'params', 'traj', 'task'. with each having dim 0 length as 17286 and traj, task having dim 1 of length 1020, 117 respectively. I understood the 'traj' tensor to be related to the long trajectory recorded for each checkpoint i.e. s_0, a_0, s_1, a_1, ....s_n, a_n but I am not sure. Also I am unsure about the data stored in the task tensor? Am I correct in understanding this is where the s_K to s_K+m post success states are stored or is there any other tensor storing s_K to s_K+m? Can you please help me understand the contents of the training data file, data.pt, especially the traj tensor since I am unable to understand the shapes of each state and action and also the format in which they are stored ? This will help me in implementing the framework on my concerned dataset. TIA!

cheryyunl commented 2 months ago

Thank you for your advices!! We plan to update our dataset card but we meet some issues on the huggingface. Now let me update the dataset details for you. 'param': all the parameters in the policy network as a vector 'traj': prior trajectories, but as 's_0, a_0, a_1, a_2, s_3, a_3, a_4, a_5', because the actions in the next step are similar in most control tasks, so we choose multiple three actions following one state. (About 1020=20(39+34), the state dim=39, the action dim=4) 'task': the success three states (117=39*3) However, if you want to train with your dataset or task, you can privately design the trajectory dimensions and encode them to the same dimension (for example we used 128). Also, you can use our pretrained model with the same behavior dimensions to finetune on your dataset. Hope our reply can help you! Thanks!

AmoghJuloori commented 2 months ago

Thank you for providing the details of the dataset! It is very helpful in understanding the structure of the data and made it easier to implement the framework on my own dataset. May I know how you guys collected the trajectory data and the success states data? I assume the policy network parameters are first stored as a checkpoint during training, and then the policy is rolled out to collect the traj and task data. Please let me know if I am right or wrong in this regard. Thanks a lot for your help!

cheryyunl commented 2 months ago

Yes, you are right! We rollout the policy to collect trajectories for training.

AmoghJuloori commented 2 months ago

Oh okay. Thanks a lot for your support!

AmoghJuloori commented 2 months ago

Hey @cheryyunl , I would like to know about the param tensor too, which stores the parameters of the policy network for every checkpoint. I am trying to adapt this framework onto my own dataset that I have acquired by training with PPO instead of SAC. In my training dataset, I tried storing the parameters for every checkpoint, in a list, as it is given by the agent.state_dict() function, which returns a dictionary of the agent's parameters. This format doesn't allow me to train on my dataset, so I would like to ask for the format in which the parameters are stored in the tensor for every checkpoint. Thanks!