facebookresearch / ReAgent

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
https://reagent.ai
BSD 3-Clause "New" or "Revised" License
3.58k stars 521 forks source link

Mask out non-present arm scores for Offline Eval #702

Open alexnikulkov opened 1 year ago

alexnikulkov commented 1 year ago

Summary: When some arms might be missing, apply a masked softmax to model scores during offline eval to avoid selecting the missing arms.

Differential Revision: D41990957