CDU-data-science-team / zero-shot

Zero-shot classification of Patient Experience data
MIT License
0 stars 0 forks source link

Add more performance metrics #3

Open andreassot10 opened 3 years ago

andreassot10 commented 3 years ago

@asegun-cod, the accuracy score can be inflated by great model performance in predicting one or just a few classes. It would be good to add metrics that account for this. For example, Balanced Accuracy or Matthews Correlation Coefficient.

See https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics

and https://github.com/CDU-data-science-team/pxtextmining/blob/main/pxtextmining/helpers/metrics.py.

andreassot10 commented 3 years ago

To clarify, I don't mean drop the accuracy score- just add more metrics.

asegun-cod commented 3 years ago

Thanks, I will do that. I am also thinking, do you it will be useful if we have the accuracy of each individual label added also? I feel these might help us see the labels that are usually well predicted and those that are not. this might be a useful metric. what do you think @andreassot10?

ChrisBeeley commented 3 years ago

Definitely. Also a confusion matrix would be great

ChrisBeeley commented 3 years ago

E.g. https://github.com/CDU-data-science-team/experienceAnalysis/blob/main/R/plot_confusion_matrix.R

asegun-cod commented 3 years ago

these metrics have now been added and committed.

asegun-cod commented 3 years ago

these metrics have now been added and committed.

andreassot10 commented 3 years ago

@asegun-cod , can you please direct us to specific commits and/or permalinks when making changes so that we can easily track them.

ChrisBeeley commented 2 years ago

Reopening. Let's close this issue once we have a permalink to some code on the main branch

asegun-cod commented 2 years ago

182eb46 now contain a balance accuracy score. this issue will remain open because the current model performed very badly on this metric. This might have something to do with what was mention here about the current pipeline not accounting for the unbalanced nature of the training data when preparing data for training the model. This will be further explored.

ChrisBeeley commented 2 years ago

Could you write up the metrics and commit to the repo so we can review?

In R we would use RMarkdown, don't really know the options in Python. I've got some code I can dig out somewhere to help if you get stuck