Closed hagenw closed 2 months ago
I still think that these are drop-in replacements for single item (reliability) coefficients proper, and I am still not convinced that they optimally located in the gold standard module
I'm not sure about this, as we use the functions to add a confidence/agreement value to the gold standard (e.g. mean/median) we calculate for each sample. Whereas the reliability coefficients return a single value over all stimuli and rater, or would it make sense to call them on a per stimulus base as well?
The only thing that I am still wondering is the calculation of the standard deviation for now: If we are calculating the sample standard deviation, then we divide by n-1, one less than the number of data values. The current implementation calls
np.nanstd
withddof=0
, i.e. the population standard deviation.I wonder about the motivation for this? My intuition would be that we cannot generalize to the entire population of raters here and I would gussed that the sample std would be in place. Should this motivation be documented, or should the
np.nanstd
be parametrizable when it is called?
The original discussion is at https://gitlab.audeering.com/data/msppodcast/-/merge_requests/23#note_196359.
The reasoning goes like this: when a model should learn the confidence for a rating it should only depend on the audio signal and not on the number of raters that have judged the sample, and we have:
>>> # sample standard deviation
>>> np.std([0, 0, 0.2, 0.4], ddof=1)
0.19148542155126763
>>> np.std([0, 0, 0, 0, 0.2, 0.2, 0.4, 0.4], ddof=1)
0.17728105208558367
>>> # population standard deviation
>>> np.std([0, 0, 0.2, 0.4], ddof=0)
0.16583123951777
>>> np.std([0, 0, 0, 0, 0.2, 0.2, 0.4, 0.4], ddof=0)
0.16583123951777
To be in line with audpsychometric.agrrement_categorical()
, I added a discussion to the docstring that nan
is ignored.
I still think that these are drop-in replacements for single item (reliability) coefficients proper, and I am still not convinced that they optimally located in the gold standard module. I encounter that this is a stab into a hornet's nest, but it might be worthwile to discuss it - later.
Feel free to open an issue for that.
The only thing that I am still wondering is the calculation of the standard deviation for now: If we are calculating the sample standard deviation, then we divide by n-1, one less than the number of data values. The current implementation calls
np.nanstd
withddof=0
, i.e. the population standard deviation. I wonder about the motivation for this? My intuition would be that we cannot generalize to the entire population of raters here and I would gussed that the sample std would be in place. Should this motivation be documented, or should thenp.nanstd
be parametrizable when it is called?The original discussion is at https://gitlab.audeering.com/data/msppodcast/-/merge_requests/23#note_196359.
The reasoning goes like this: when a model should learn the confidence for a rating it should only depend on the audio signal and not on the number of raters that have judged the sample, and we have:
>>> # sample standard deviation >>> np.std([0, 0, 0.2, 0.4], ddof=1) 0.19148542155126763 >>> np.std([0, 0, 0, 0, 0.2, 0.2, 0.4, 0.4], ddof=1) 0.17728105208558367 >>> # population standard deviation >>> np.std([0, 0, 0.2, 0.4], ddof=0) 0.16583123951777 >>> np.std([0, 0, 0, 0, 0.2, 0.2, 0.4, 0.4], ddof=0) 0.16583123951777
@ChristianGeng if you think this is not the correct approach, please open an issue, and we can discuss their. In this pull request, I would propose to focus on fixing the shape bug.
@ChristianGeng if you think this is not the correct approach, please open an issue, and we can discuss their. In this pull request, I would propose to focus on fixing the shape bug.
As you say: mafor discussions are beyond the scope of the fix here. I will happily open a new, more conceptual issue once my thought process converges into something meaningful - if it ever does ;-)
I will check whether this is already approved - and otherwise do so.
Addresses the problem of wrong output shape of
audpsychometric.agreement_numerical()
as described in https://github.com/audeering/audpsychometric/pull/13#discussion_r1734784060.nan
values, which are now ignored