Closed outshine-J closed 1 year ago
Hi,
Thank you for asking. I don't think the task association is an issue. The code treats randomly initializing environment and randomly sampling tasks (for train/test split) from initialized environments in different ways.
For the former, the code uses a fixed intrinsic seed regardless of the random seed you assign. The only exception is for walker environment, where you need to set config['env_params']['randomize_tasks']=False to ensure everytime, exactly the same set of envs are created.
For the latter, indeed the replay buffer will pre-collect samples from different sets of envs, upon given different random seeds (e.g {env1, env 2} for seed 0 and {env 2, env 3} for seed 1). Think of this like a random train/test split. However, as long as the initialized environments remain the same (env 1 has the same set of parameters regardless of your random seed), I dont see there is any mismatching issue.
Hope this helps and please let me know if you have further questions. @outshine-J
Thanks a lot for your reply, I think I get it.
Hi, I found that the code randomly samples n_tasks tasks when creating the environment, so when using the pre-collection samples, there may be a problem that the task does not match the replay_buffer (that is, when I change a random seed, it will generate the same different tasks, but the pre-collection samples are still related to the previous task), causing problems in evaluating the training and testing tasks?