facebookresearch / hanabi_SAD

Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning
Other
96 stars 35 forks source link

Replay buffer batch size and number of training steps #16

Closed hnekoeiq closed 3 years ago

hnekoeiq commented 3 years ago

Hi @hengyuan-hu, Thanks again for your great codebase.

1- It seems passing the args.batchsize or any other value does not make any difference here: https://github.com/facebookresearch/hanabi_SAD/blob/502d6a7a52028511704c944dffe1945194e10c3a/pyhanabi/selfplay.py#L218 When I want to sample a batch with different batch size from the initial args.batchsize, this function returns a batch with the size of args.batchsize. It seems the batchsize is somewhere hardcoded but I'm not able to find it. I really appreciate it if you could navigate me to the right place correct it?

2- How can I get access to the total number of training steps or even better restrict the total number of training steps to a specific number?

hengyuan-hu commented 3 years ago

1) The batchsize is not hardcoded anywhere. Do you mean that you want to change the batch size on the fly for each batch? Due to the prefetch mechanism, i.e. the replay buffer will prepare at most the next k batch in advance assuming the batchsize is the same, the change of the batchsize will have delayed effect instead of no effect. By default the prefetch=3 as set here https://github.com/facebookresearch/hanabi_SAD/blob/502d6a7a52028511704c944dffe1945194e10c3a/pyhanabi/selfplay.py#L72. I tried to increase the batchsize by 1 for each batch and I got the following log (the second number in the size is batchsize):

torch.Size([80, 65, 21])
torch.Size([80, 65, 21])
torch.Size([80, 65, 21])
torch.Size([80, 65, 21])
torch.Size([80, 66, 21])
torch.Size([80, 67, 21])
torch.Size([80, 68, 21])
torch.Size([80, 69, 21])
torch.Size([80, 70, 21])
torch.Size([80, 71, 21])

Here the first 4 batches have a size of 65 due to prefetch but later the batchsize will change. If you want to have a more immediate effect, try turn down the value of prefetch. If you turn it all the way down to 0 then batchsize should change immediately, but the smaller prefetch value may affect training speed as more time will be spent on waiting the next batch.

2) By training step, do you mean number of gradient steps or number of simulation steps? You can get the total number of simulation steps here https://github.com/facebookresearch/hanabi_SAD/blob/502d6a7a52028511704c944dffe1945194e10c3a/pyhanabi/utils.py#L164, which simply reads the value from https://github.com/facebookresearch/hanabi_SAD/blob/502d6a7a52028511704c944dffe1945194e10c3a/rela/r2d2_actor.h#L98. You can set some limits around there if you want to restrict the amount of data used.

hnekoeiq commented 3 years ago

Thanks a lot for your reply.

One more question I have is regarding the way replay buffer is used by ActGroup for IQL agents. I am trying to have two separate replay buffers to store the transitions of two IQL agents independent of each other. To this end, I created two replay buffers and passed them to the ActGroup. Then, for creating the r2d2 actors, I pass their respective replay buffer here: https://github.com/facebookresearch/hanabi_SAD/blob/502d6a7a52028511704c944dffe1945194e10c3a/pyhanabi/create.py#L132 I am wondering if this ensures that their experiences are separately stored. In particular, I'm not sure if having the shared pointer here: https://github.com/facebookresearch/hanabi_SAD/blob/502d6a7a52028511704c944dffe1945194e10c3a/rela/r2d2_actor.h#L33

would affect this.

hengyuan-hu commented 3 years ago

Your solution should work. R2D2Actor will only write to the replay buffer passed to them at construction.