How to measure the effectiveness of tag "controls" (e.g. sentiment)?

enzoampil / tito-joker

A humorous AI that uses state-of-the-art deep learning to tell jokes

http://35.225.94.177:8501/

GNU General Public License v3.0

45 stars 5 forks source link

How to measure the effectiveness of tag "controls" (e.g. sentiment)? #33

Open enzoampil opened 4 years ago

enzoampil commented 4 years ago

I was thinking that simple methodology would be to generate sequences for sentiment spans and measure accuracy based on some overlap measure, e.g. Jaccard, or rougue.

While imperfect, this would be an initial approach to communicating the effectiveness of utilizing text generation controls, derived from pre-trained supervised models.

Tweets would be a good start.

Potential dataset

https://www.kaggle.com/c/tweet-sentiment-extraction/data

enzoampil commented 4 years ago

Can also try to get a pooled version of each predicted token vector based on the regular Tito Joker model, and another model that generates only positive statements. This new model should be a language model fine tuned on positive examples of a sentiment analysis dataset.

enzoampil commented 4 years ago

OpenAI published exactly how to do the above with IMDB reviews!!

Interesting to see that the tag controls are quite similar to ours, the main difference seems to be the use of an additional reward function for the sentiment special tokens.

https://lvwerra.github.io/trl/05-gpt2-sentiment-control/