Open waterhorse1 opened 2 years ago
Hey thanks, yeah I'm doing it this way mainly because I couldn't get the program to train and save the replay buffer. My thought was that you would be better getting samples over the prior distribution after you've pre-trained so you just sample from z ~ Z then can more explicitly look at where the different priors come from.
I have completed the save during the training process and if you are interested in it I can raise one pull request to url-suite.
That would be great, happy to merge thanks
Nice extension from URLB and it is really helpful for my research. I want to know how the dataset is collected in your sample.py function. From the paper of exo-rl, I think it collects the data from the 10M timesteps during the training process. However, in your sample.py function, it will offer the sample function for one given model. So I am wondering if this is the right way to collect the offline data, or do I miss something?