Closed EdoardoPona closed 1 year ago
implement binary reward models for the sentiment task, as opposed to CE with correct class
implementd at https://github.com/diogo-cruz/RL4LMs/commit/3f863e15c32ce0d2726287cb8e2ed01093d8c1d3
implement binary reward models for the sentiment task, as opposed to CE with correct class