Closed ShayekhBinIslam closed 11 months ago
Hi Shayekh,
Thanks for your question! We used the TorchBeastEncoder
for the reward model because of its more sophisticated message-encoding architecture. The current reward model in Motif only relies on messages, which are better encoded by the TorchBeastEncoder
. As you've noticed, we preferred the NLEMainEncoder
for the Actor-Critic, both because of better performance and because of better runtimes.
Hope this helps you understand the motivation behind our choices.
Best,
Pierluca
The
TorchBeastEncoder
is used as the default reward encoder in Motif. However, in the Actor-Critic, the default encoder isNLEMainEncoder
. We noticed a significant performance degradation when we choseTorchBeastEncoder
as the Actor-Critic's encoder. That isNLEMainEncoder
performs very well as the encoder used in the Actor-Critic. Why are these particular choices of the reward encoder and the RL encoder in this work?