Closed Ashminator closed 2 years ago
Yes, train_dataset
uses an iterator that randomly samples from the full replay buffer (among all data collected so far). Plan2Explore tends to explore very well in the unsupervised setting but I think it's not entirely clear how to best combine it with task rewards --- it often just explores too much and thus gets worse task performance. But you can try exploring without rewards and use --expl_until
to switch to a greedy task policy later. Hope that helps, unfortunately I won't be able to provide more detailed help than this :)
Thanks a lot Danijar! Really swift response, super helpful. I'll try implement either that or even expl_every may be a shout. I also think it's worth switching out the prefill stage with the exploratory agent for both the default and Plan2Explore cases and seeing results.
Hi Danijar, I'm currently doing a project where I'm running DreamerV2 on some of the alternative exploration agents. I have two questions:
print('Create agent.') train_dataset = iter(train_replay.dataset(**config.dataset))
And this line in the for loop which iterates over the batches.
for _ in range(config.train_steps): mets = train_agent(next(train_dataset))
I just wanted to sanity check with you that the next(train_dataset) batch is pulled from the entire buffer in train_replay._complete_eps, and that it's being updated as such, since I don't see train_dataset being updated after its initialisation. I also wanted to clarify that if the expl_behaviour is set to not greedy, the training episodes use the exploratory agent, and that data collected by this agent is sampled in subsequent batches of next(train_dataset). Possibly a silly question but in case I was missing something I tried the following modification:
for _ in range(config.train_steps): train_dataset = iter(train_replay.dataset(**config.dataset)) mets = train_agent(next(train_dataset))
Where train_dataset was re-initialised and I got worse results than the default behaviour.
a) any steps needed to be done in order for Plan2Explore to work properly, other than just updating configs.yaml with expl_behavior: Plan2Explore (this is what I currently have)
b) it takes more than a few million steps for Plan2Explore to perform as well as default Dreamer. Here's a graph of the situation:
Note: I'd accidentally had action_repeat set to 4 in both these games, so divide by 4 to get the true number of steps on the x-axis.
Thanks in advance!