add support for population-based training

Description

Adding support for population-based training for variable-length reinforce-based training. Credits to @eugene-kharitonov

Note that signature and methods for variable-length reinforce-based training do not change. Current games using SenderReceiverRnnReinforce are not impacted.

Motivation and Context

Training and supporting multiple listeners and multiple receiver can be useful for population-based (possibly generation-based) games.

How Has This Been Tested?

UTs with new population file pass (note: these includes games using the SenderReceiverRnnReinforce class)

facebookresearch / EGG