Utility function for computing EWE

ATriantafyllopoulos commented 2 years ago

The topic of computing the EWE has come up several times. As its computation is straightforward but not trivial, I would offer to add a utility function here to compute it, then a user can automatically compute it for their dataset and easily add it to a conversion script.

The approach I have in mind is to add two functions, one to compute annotator confidence, and the other to compute the EWE. Those will roughly look as follows:

class ComputeEWE:
    def __init__(self, confidences):
        self.confidences = confidences

    def __call__(self, row):
        raters = [x for x in self.confidences if row[x] == row[x]]
        total = sum([row[x] * self.confidences[x] for x in raters])
        total /= sum([self.confidences[x] for x in raters])
        return total

def compute_ewe(df, confidences):
    rater_names = list(set(confidences.keys()) & set(df.columns))
    valid_confidences = {}
    for key in rater_names:
        valid_confidences[key] = confidences[key]
    return pd.DataFrame(
        data=df.apply(ComputeEWE(valid_confidences), axis=1),
        index=df.index,
        columns=['EWE']
    )

def rater_confidence(df, raters = None):
     if raters is None:
             raters = df.columns
     confidences = {}
        for rater in raters:
            df_rater = df[rater].dropna().astype(float)
            df_others = df.drop(rater, axis=1).mean(axis=1).dropna()
            indices = df_rater.index.intersection(df_others.index)
            confidences[rater] = audbenchmark.metric.pearson_cc(
                df_rater.loc[indices],
                df_others.loc[indices]
            )
      return confidences

@hagenw @frankenjoe what do you think about this?

frankenjoe commented 2 years ago

Yes, it's definitely a good idea to have a utility function for EWE. But maybe it would better fit into audmetric?

ATriantafyllopoulos commented 2 years ago

Yes, it's definitely a good idea to have a utility function for EWE. But maybe it would better fit into audmetric?

Hm, it doesn't fit the standard API there (func(y_true, y_pred)) and it's not a metric.

On the other hand, it's very specific to emotion recognition (or tasks where the labels come from subjective ratings) so it doesn't fit well into audformat either (which should be about general audio data representation).

So I'm not sure..I'd still vote for audformat over audmetric, but maybe we have another package that's more targeted to SER?

hagenw commented 2 years ago

@ChristianGeng started working on a package that has annotation related functions as well, e.g. compute rater agreement. He also asked if some of those would fit into audmetric and I replied that there we gather more machine learning related metrics that expect truth and prediciton as input.

So maybe it would make sense to add the EWE calculation to the new package of @ChristianGeng?

ATriantafyllopoulos commented 2 years ago

I was hoping for a public implementation :-)

frankenjoe commented 2 years ago

So maybe it would make sense to add the EWE calculation to the new package of @ChristianGeng?

Sounds good!

hagenw commented 2 years ago

Maybe the new package could also be made public?

hagenw commented 2 years ago

We will not add this functionality to audformat, but I will leave this issue open, maybe we can report at some point in time if the EWE calculation is published in another package.

audeering / audformat

Utility function for computing EWE #102