Open andreassot10 opened 3 years ago
To clarify, I don't mean drop the accuracy score- just add more metrics.
Thanks, I will do that. I am also thinking, do you it will be useful if we have the accuracy of each individual label added also? I feel these might help us see the labels that are usually well predicted and those that are not. this might be a useful metric. what do you think @andreassot10?
Definitely. Also a confusion matrix would be great
these metrics have now been added and committed.
these metrics have now been added and committed.
@asegun-cod , can you please direct us to specific commits and/or permalinks when making changes so that we can easily track them.
Reopening. Let's close this issue once we have a permalink to some code on the main branch
182eb46 now contain a balance accuracy score. this issue will remain open because the current model performed very badly on this metric. This might have something to do with what was mention here about the current pipeline not accounting for the unbalanced nature of the training data when preparing data for training the model. This will be further explored.
Could you write up the metrics and commit to the repo so we can review?
In R we would use RMarkdown, don't really know the options in Python. I've got some code I can dig out somewhere to help if you get stuck
@asegun-cod, the accuracy score can be inflated by great model performance in predicting one or just a few classes. It would be good to add metrics that account for this. For example, Balanced Accuracy or Matthews Correlation Coefficient.
See https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
and https://github.com/CDU-data-science-team/pxtextmining/blob/main/pxtextmining/helpers/metrics.py.