fairlearn / fairlearn-proposals

Proposal Documents for Fairlearn
MIT License
9 stars 6 forks source link

Naming the new metrics API #17

Open riedgar-ms opened 3 years ago

riedgar-ms commented 3 years ago

With the new metrics object almost finalised, we need to decide on the names. Currently, the object looks like this:

class GroupedMetric
    def __init__(self, metric_functions,
                 y_true, y_pred, *,
                 sensitive_features,
                 conditional_features=None,
                 sample_params=None)

    @property
    def overall(self)

    @property
    def by_group(self)

    def group_max(self)

    def group_min(self)

    # Other accepted method is 'to_overall'
    def difference(self, method='minmax')

    # Other accepted method is 'to_overall'
    def ratio(self, method='minmax')

With the exception of __init__ all names here are potentially changeable. The ones I think are most 'controversial' are the class name, and the method= argument and its accepted values (if others think differently, those can be discussed too). I will update this starting post with suggestions from the discussion to save everyone having to read through the whole thread each time.

Options for the class name

Some thoughts

Consensus is on this. Use 'metrics' since 'metric' could be an adjective and not a noun.

Options for method=minmax

There are two components - the name of the argument itself, and its accepted values. The latter could potentially depend on the former.

For the argument name

Consensus is to stick with method=

For the allowed values:

There isn't a huge amount of love for these, but there's a shortage of concise alternatives that don't suffer from similar ambiguity issues.

Options for group_min/max()

Stick with group_min()/max(). They do have value when there are conditional features present.

Options for conditional features

Sample Params

Keep as sample_params. The most meaningful alternative is sample_props, and that wasn't felt to be a significant difference.

riedgar-ms commented 3 years ago

Tagging @adrinjalali @hildeweerts @amueller @MiroDudik

hildeweerts commented 3 years ago

Question: according to SLEP006 scikit-learn devs decided to refer to the thing we call sample_params as Metadata. Do you know whether this is the final naming? @adrinjalali @amueller?

romanlutz commented 3 years ago

As someone who wasn't super involved in this group_min and group_max were confusing to me. I assumed that it means "min/max within a group" as opposed to "min/max of values among all groups". A more intuitive name would be min_group and max_group. Thanks for considering!

adrinjalali commented 3 years ago

we call that parameter in SLEP6 metadata since it can potentially include non-sample-aligned data, like column metadata and data metadata. I think for the purpose of this work, those extra params are always sample aligned, so I wouldn't mind calling them sample_params.

adrinjalali commented 3 years ago

In terms of names, GroupMetricResult kinda tells me this is an object mostly returned by another method and not created by the user. But I think users are gonna be creating this object, therefore I kinda prefer the other options.

As for the kwarg name, I prefer method since it's more generalizeable than relative_to.

As for group_{min, max} doesn't it mean the same as {min, max}_by_group?

MiroDudik commented 3 years ago

I'm going to also open the discussion about conditional_features and sample_params (at the end below).

Re. class name

I'm not enamored with any of these. Another option is DisaggregatedMetric, but I don't love it either. (I'll punt on this while I think some more.)

Re. method=minmax

I prefer the keyword method for the same reason as @adrinjalali. For the allowed values, how about:

Re. group_min/max

I am fine with renaming for better clarity, but I have some issues with the suggested alternatives.

The issue with min_by_group is that it sounds like it returns a vector of minima (one for each group). And the issue with min_group is that it sounds like it returns the group rather than a minimum. So among the options above, I'm still in favor of sticking with group_min, but maybe there are other suggestions?

[Also, while thinking about this, bear in mind that it needs to work in compounds like accuracy_score_group_min.]

Re. conditional_features

I'm not too happy about the name although it came from me based on the terminology conditional statistical parity. But it doesn't quite work in describing features--the features themselves are not conditional, it is their values that are being conditioned on. So some alternative names:

Re. sample_params

I still find this a bit ambiguous. How about column_params? We used to call these indexed_params which is maybe more confusing.

riedgar-ms commented 3 years ago

Thanks @MiroDudik . I've added those above (and @hildeweerts)

hildeweerts commented 3 years ago

Class Name

In my opinion GroupedMetric sounds like an object that represents a metric that can be used multiple times on different data, not the actual metric scores. I can see Adrin's point, that GroupMetricResult sounds like an object returned by another method. Personally, I'd find it more natural to have a function that results in the GroupedMetric object, rather than an object that computes all kinds of things on initialization. But I remember that at the time we discussed this, people didn't like to add even more new terms. So given that I can't think of a better option, I think I'll just settle for GroupedMetric.

method=minmax

I strongly prefer method as it allows for more flexibility in the types of methods we allow, perhaps even custom aggregator functions.

In terms of options, I think it's important to stay consistent. For example, max_to_min makes it explicit that we compare the maximum to the minimum, but to_overall doesn't tell you how we're aggregating the computed differences/ratios. I'd be in favor of ensuring these terms are more consistent.

Options for group_min/max()

We could just call it min(). The 'group' is pretty much implied by the fact that it's a GroupedMetric object. Full disclosure: I am not 100% convinced that these functions should even exist; it doesn't really save you any code and clutters the API.

I am also not convinced we need accuracy_score_group_min(). If you're using this for model selection you'll implicitly compare it to the overall accuracy score anyways, so I don't really see the added value compared to difference(method='to_overall'). I'm happy to be convinced otherwise, but at this time I'd rather not clutter the API.

Options for conditional features

Now that Miro said it I cannot unsee the inaccurate terminology, lol. I think proper English would be conditioning_on, right?

Sample Params

I find column_params very confusing; what columns are we referring to? I like sample_params, especially because I expect these parameters to be mostly relevant for sample_weights.

riedgar-ms commented 3 years ago

Thanks @hildeweerts . I've put your suggestions up in the main post (although not your reasoning - we can talk about that in the meeting).

I'm also not convinced we need all of accuracy_score_group_min() etc. We should provide the functions for demographic parity, equalised odds and equal opportunity, but... I'm less convinced of the need for all the others.

MiroDudik commented 3 years ago

Mhmm... I'd like to suggest tweaks to the above.

Re. class name

The final two contenders were

After discussing this with @hannawallach, I'm now strongly in favor of MetricFrame, because that's by far the most common usage pattern for attributive nouns: https://english.stackexchange.com/questions/59059/should-a-list-of-tokens-be-called-a-token-list-or-a-tokens-list

There's a really really minor usage of "metric frame" terminology in the theory of algebraic structures, but I don't think we need to worry about it: https://www.google.com/search?q=%22metric+frame%22&tbm=bks

Are there any objections to this?

Re. method=

We have the consensus that the keyword should be method, and that it should default to the current behavior, but we still have not fully resolved the names (but almost there). The last idea was between_pairs and to_overall, but I think I have two better suggestions:

  1. between_pairs and to_overall [last decided, but I think it's ambiguous]
  2. between_groups and to_overall [this is much clearer in my opinion]
  3. group_to_group and group_to_overall [slightly more verbose, but perhaps the clearest?]

I am in favor of either 2 or 3. If there are no strong feelings either way, I suggest we go with 3 for greatest clarity.

Re. conditioning

I have discussed this with some folks familiar with causal inference. The recommended suggestion is:

This matches the language that a data scientist might use to talk about the metrics, e.g., “controlling for the credit score, the loans are given at similar rates regardless of race” (here credit score is the control feature and race is the sensitive feature). This also matches the statistical terminology: https://en.wikipedia.org/wiki/Controlling_for_a_variable

Are there any objections?

riedgar-ms commented 3 years ago

Tagging @hildeweerts and @adrinjalali for comments. I don't have any strong preference, beyond having a decision so that I can finalise the PR.

hildeweerts commented 3 years ago

I am fine with either MetricFrame or MetricsFrame, but MetricFrame is less typing which is always nice! :)

What I like about between_groups/between_pairs is that it emphasizes that you're comparing all combinations of groups, which isn't as clear to me for group_to_group. But I don't have a strong opinion. TBH I don't think it matters a lot what we call these, since people will have to check the docs anyways to have any idea what is going on... But if we're going with option 3, isn't the group_ part redundant, given that group_to_group and group_to_overall are the only two options?

I like control_features! (low-key disappointed that we're not going for the shampoo advert though)

MiroDudik commented 3 years ago

@hildeweerts : good feedback re. between_groups vs group_to_group. i see what you're saying and now i'm leaning towards the combination of between_groups and to_overall.

riedgar-ms commented 3 years ago

Unless @adrinjalali objects I will go back to MetricFrame, switch over to control_features and change between_pairs to between_groups.