facebookresearch / motif

Intrinsic Motivation from Artificial Intelligence Feedback
Other
118 stars 14 forks source link

NLEMainEncoder vs TorchBeastEncoder as the Actor-Critic's Encoder #8

Closed ShayekhBinIslam closed 11 months ago

ShayekhBinIslam commented 11 months ago

The TorchBeastEncoder is used as the default reward encoder in Motif. However, in the Actor-Critic, the default encoder is NLEMainEncoder. We noticed a significant performance degradation when we chose TorchBeastEncoder as the Actor-Critic's encoder. That is NLEMainEncoder performs very well as the encoder used in the Actor-Critic. Why are these particular choices of the reward encoder and the RL encoder in this work?

proceduralia commented 11 months ago

Hi Shayekh,

Thanks for your question! We used the TorchBeastEncoder for the reward model because of its more sophisticated message-encoding architecture. The current reward model in Motif only relies on messages, which are better encoded by the TorchBeastEncoder. As you've noticed, we preferred the NLEMainEncoder for the Actor-Critic, both because of better performance and because of better runtimes.

Hope this helps you understand the motivation behind our choices.

Best,

Pierluca