facebookresearch / ReAgent

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
https://reagent.ai
BSD 3-Clause "New" or "Revised" License
3.58k stars 521 forks source link

Track average A and b instead of cumulative in LinUCB #707

Closed alexnikulkov closed 1 year ago

alexnikulkov commented 1 year ago

Summary: I'm updating the training logic of LinUCB to keep track of the average values of A and b instead of cumulative values. This should improve the numerical stability of training by preventing numerical overflows.

The average values are aggregated among the trainers and among epochs when computing the coefficients.

Differential Revision: D42334470

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D42334470

facebook-github-bot commented 1 year ago

This pull request has been merged in facebookresearch/ReAgent@517a67fe26296e4a8953428cde4e0cb005451232.