dccuchile / wefe

WEFE: The Word Embeddings Fairness Evaluation Framework. WEFE is a framework that standardizes the bias measurement and mitigation in Word Embeddings models. Please feel welcome to open an issue in case you have any questions or a pull request if you want to contribute to the project!
https://wefe.readthedocs.io/
MIT License
173 stars 14 forks source link

WEFE documentation is inconsistent with the literature #22

Closed raffaem closed 3 years ago

raffaem commented 3 years ago

The WEFE documentation says that a target set:

A target word set (denoted by ) corresponds to a set of words intended to denote a particular social group e.g., gender, social class, age, and ethnicity

while and attribute set:

An attribute word set (denoted by ) is a set of words representing some attitude, characteristic, trait, occupational field, etc.

But in this paper the two concepts are reversed, with target meaning mathematics and arts (occupational fields) and attribute meaning male and female (a particular social group).

Is it a problem of the paper alone?

Are those terms used consistently in the literature?

What happens to the WEAT if I exchange attribute and target sets?

(Is the WEFE documentation right?)

raffaem commented 3 years ago

In effect, Caliskan (2017, pag. 2, link) says:

The details of the WEAT are as follows. Borrowing terminology from the IAT literature, consider two sets of target words (e.g., programmer, engineer, scientist; and nurse, teacher, librarian) and two sets of attribute words (e.g., man, male; and woman, female).

So it looks like:

At the very least the WEFE documentation is inconsistent with the literature.

Are the calculations also switched?

pbadillatorrealba commented 3 years ago

Hi @raffaem ,

It is an interesting topic that you comment. At the time we created the framework we also discussed about this, as there was a great lack of consistency: Each metric proposed its own input sets with different names and definitions.

In general, considering target sets as the words with social sets and attribute sets as the words was a convention we took to design the framework based on the original experiments in the IAT paper: "Measuring individual differences in implicit cognition: The implicit association test" (https://www.uni-muenster.de/imperia/md/content/psyifp/aeechterhoff/wintersemester2011-12/attitudesandsocialjudgment/greenwaldmcgheeschwatz_iat_jpsp1998.pdf).

Now, in practical terms, note that the original paper's definitions of the WEAT metric (Semantics derived automatically from language corpora contain human-like biases) do not specifically define what is a target set and what is an attribute set. They only give examples of the sets. However, if you look at Table 1 (where the authors show the results of the experiments) it is more common to see that the target sets represent a social group and the attributes represent some features they want to study. In general, other studies also conducted their experiments in this way. This is why in order to standardize the concepts, we decided to restrict the concept of target to a social group and that of attributes to some attitude, characteristic, trait, occupational field, etc.

Regarding your questions:

What happens to the WEAT if I exchange attribute and target sets?

It will depend on the operation performed by each metric, but in general, metrics are not commutative. This is mainly because most of them perform different operations on the target and attribute sets. In the case that you change the targets by the attributes you will be also changing the interpretation of the query.

Are those terms used consistently in the literature?

In general not very much, that is why we decided to restrict them to the above.

Is it a problem of the paper alone?

I would not know how to answer this question with certainty.

Pablo.