Tiiiger / bert_score

BERT score for text generation
MIT License
1.54k stars 209 forks source link

Semantic similarity between essays and a theme #166

Open RaphaelSilv opened 1 year ago

RaphaelSilv commented 1 year ago

Hi guys, thanks for this fantastic project.

I intend to use it to measure the similarity between essays written by students and a given theme. The theme is a one-line sentence and each essay has a couple of paragraphs. I have a dataset where essays written in conformation to the theme have a positive score ranging from 20 - 200 while essays that don't consider the theme receive 0. From what I've glanced at the original article, and played around using the relevant pre-trained bert model in opposition to the default language, it might be a very doable thing, although not perfect. I still have some doubts about how to use the weighting, which I hope will improve the measurements I expect to get.

Anyhoo, any advice on how to approach this task? Please, any do's or don'ts are welcomed 😃

RaphaelSilv commented 12 months ago

Hey guys, some updates:

After the calculations, I got the average of the scores and plotted the results into a scatter plot chart.

As the images show I didn't get any correlation of any of the BERTscores with the essays score. Does anyone have any inputs that would help to improve these results? Thank you!

image image image