Implement sentiment reward

EdoardoPona / predicting-inductive-biases-RL

fork of https://openreview.net/forum?id=mNtmhaDkAr - extending for inductive bias in RL

1 stars 0 forks source link

Implement sentiment reward #12

Closed diogo-cruz closed 1 year ago

diogo-cruz commented 1 year ago

This consists of:

Loading some finetuned ML model for sentiment analysis.
Taking prompt+generated_sequence as input for that model and outputting a scalar value that can act as a reward for PPO.

EdoardoPona commented 1 year ago

Pretrained model for sentiment analysis

We could use https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest tldr: trained on tweets for 3 classes: negative, neutral, positive.

reward can be the cross entropy with the desired class.

The only thing to make sure about for this model, is whether it fits on VRAM at the same time at GPT-2. It should not occupy much more than 1GB.

EdoardoPona commented 1 year ago

current implementation at https://github.com/allenai/RL4LMs/commit/e147dd3dee0e539cb96d97e3ffb851ef63c85fc0

largely untested end to end, but individually works

EdoardoPona commented 1 year ago

current reward model was trained on twitter data. Given meeting outcome, agreeing to fine-tune on same dataset as sentiment model, I will update this to use a model based on IMDB.

EdoardoPona commented 1 year ago

implemented in https://github.com/allenai/RL4LMs/commit/d80687c8bf51e062665f7c9dc5aa5dfd76aebf46#diff-536f6459cbccfcd83858e1758cf46ab592a5393e506cc124e9c32116673bfb28

features

generic abstract learned classifier reward which holds any pretrained classifier loadable by huggingface automodels.
concrete bert twitter and xlnet imdb reward function implementations
metrics for implemented rewards