EdoardoPona / predicting-inductive-biases-RL

fork of https://openreview.net/forum?id=mNtmhaDkAr - extending for inductive bias in RL
1 stars 0 forks source link

Rewards without confidence #22

Closed EdoardoPona closed 1 year ago

EdoardoPona commented 1 year ago

implement binary reward models for the sentiment task, as opposed to CE with correct class

EdoardoPona commented 1 year ago

implementd at https://github.com/diogo-cruz/RL4LMs/commit/3f863e15c32ce0d2726287cb8e2ed01093d8c1d3