about benchmark - Githubissues

sloev commented 1 year ago

Hi I am the maintainer of another spacy pipeline sentiment library and i am trying to figure out how to benchmark spacy sentiment models fairly.

i have written something here https://github.com/sloev/sentimental-onix/tree/main/benchmark it uses this dataset https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences as foundation for a benchmark.

my issue is that both spacytextblob and my library outputs floating points but in order to validate against a test dataset i am trying to threshold our values into descrete labels neg, neu, pos. but whether it turns out to be a fair comparison is hard for me to evaluate.

results as they are (my model uses Onnx based sentiment model, and a default threshold of neg < -0.7 < neu < 0.7 < pos)

are:

library	result
spacytextblob	58.9%
sentimental_onix	69%

kind regards

SamEdwardes commented 1 year ago

Thank you for sharing! It could be more fair to compare the accuracy across the value of the floating points. For example, when the prediction is 0.9, we would hope that it is almost always correct. When it is 0.4, we would expect it to be wrong more often

A plot like this could be a more fair comparison, showing how good the models are based on different thresholds.

Screenshot 2023-03-16 at 13 54 37@2x

sloev commented 1 year ago

hi @SamEdwardes that is an AWESOME idea :-) i will definitly try that out and report back to you! i might ask for clarification if i run my head into the wall ;-)

have a great weekend!

SamEdwardes / spacytextblob

about benchmark #24