Open cproctor opened 10 months ago
After reading [1], I propose that we implement Kappa, J, Precision, Recall, and F statistics. They're quite straightforward to implement. I also think we should implement Shaffer's rho, which is described in Quantitative Ethnography (I have a copy in my office).
Let's start with kappa, and leave the rest as enhancements.
[1] Eagan, B., Brohinsky, J., Wang, J., & Shaffer, D. W. (2020). Testing the reliability of inter-rater reliability. Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, 454–461. https://doi.org/10.1145/3375462.3375508
I think we should use this paper as the methodological basis for implementing agreement:
Halpin, S. N. (2024). Inter-Coder Agreement in Qualitative Coding: Considerations for its Use. American Journal of Qualitative Research, 8(3), 23–43. https://doi.org/10.29333/ajqr/14887
And look into some metrics we may want to implement. Consider Shaffer's RhoR and some of his critiques of inter-rater agreement practices.