ModelOriented / fairmodels

Flexible tool for bias detection, visualization, and mitigation
https://fairmodels.drwhy.ai/
GNU General Public License v3.0
85 stars 15 forks source link

Fairmodels for the binary output variable #40

Closed Nehagupta90 closed 3 years ago

Nehagupta90 commented 3 years ago

Can we perform the fairmodels approach to our output variable? I mean, I have a dataset about the software bugs i.e. bug exists or not exists with values of 1 and 0 respectively. In majority of the cases, the data include 0 values (bug not exists) and researchers are using class-balancing techniques for it.

My question is can we use the fairmodels R package to evaluate how the models behave with both the 0 and 1 values?

Warm regards

jakwisn commented 3 years ago

Hi, thanks for the question! To use fairmodels you need to have 3 things - model (explained with DALEX) that has binary output, some protected vector (kind of software, language that software was written in?) - which is some attribute that indicates membership to some group, and privileged parameter (one of the indicators of membership). If I understand correctly the existence of the bug is model's target, so as long as you have some protected vector it should be ok.

Nehagupta90 commented 3 years ago

Hi Jakub and thank you for the comments.

I have an output variable bug, which has a value of 0 and 1.. Protected vector would be the bug variable but When I used 0 as a privileged value, it gives me error.

Warm regards

On Sunday, June 27, 2021, Jakub Wiśniewski @.***> wrote:

Hi, thanks for the question! To use fairmodels you need to have 3 things - model (explained with dalex) that has binary output, some protected vector (kind of software, language that software was written in?) - which is some attribute that indicates membership to some group, and privileged parameter (one of the indicators of membership). If I understand correctly the existence of the bug is model's target, so as long as you have some protected vector it should be ok.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ModelOriented/fairmodels/issues/40#issuecomment-869142038, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2LLGIT4JSFA6KBTVULTU37OBANCNFSM47MI3KVQ .

jakwisn commented 3 years ago

Could you past here a reproducible example with an output from fairness_check?

Nehagupta90 commented 3 years ago

I have two types of data, in one the bug (output variable) is binary, 0 or 1 and in another data, the bug is continuous variable (values from 1 to 20).

When I use bug as binary values like below, it gives error: Error in fairness_check(explainer, protected = test$bug, privileged = "1") : privileged subgroup is not in protected variable vector

data = readARFF("bugss.arff")

index= sample(1:nrow(data), 0.7*nrow(data)) train= data[index,] test= data[-index,]

task = TaskClassif$new("data", backend = train, target = "bug")

learner= lrn("classif.randomForest", predict_type = "prob")

model= learner$train(task )

explainer = explain_mlr3(model, data = test[,-21], y = as.numeric(test$bug)-1, label="RF")

fc= fairness_check(explainer, protected = test$bug, privileged = "0"))

When I use bug as continuous output variable, it says check fairness in regression use 'fairness_check_regression()'

On Sun, Jun 27, 2021 at 1:42 PM Jakub Wiśniewski @.***> wrote:

Could you past here a reproducible example with an output from fairness_check?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ModelOriented/fairmodels/issues/40#issuecomment-869147713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2O5LPMT3JWJ3SX4FVTTU4FCXANCNFSM47MI3KVQ .

jakwisn commented 3 years ago

I suspect that there is problem in different types of protected vector and privileged parameter. What happens if you would change the type of test$bug to character or factor?

Nehagupta90 commented 3 years ago

I don't have factor or character variables, just numeric input features and binary output variables.. However I will check it with another dataset having factor variables.

Warm regards

On Sunday, June 27, 2021, Jakub Wiśniewski @.***> wrote:

I suspect that there is problem in different types of protected vector and privileged parameter. What happens if you would change the type of test$bug to character or factor?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ModelOriented/fairmodels/issues/40#issuecomment-869174585, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2O75QM6SXMUMYKCA5LTU42CNANCNFSM47MI3KVQ .

jakwisn commented 3 years ago

Protected vector must be some categorical variable for example: c("bug_type_1", "bug_type_2",...) and privileged vector must be a value in this vector eg. "bug_type_1". So if you have numerical types of bugs, you may change it to some character vector. But the privileged parameter must be a value in the vector, if it is not met fairness_check won't work.

Nehagupta90 commented 3 years ago

It means, protected variables could be

(a) categorical variable, and could be either input or output variable?

(b) protected variables could have binary values (like True and False) or nominal values (like January, February,... December) ?

(c) protected variables could not have values like 0 or 1 ? We need to convert first 0 and 1 to No and Yes

Am I right?

On Sun, Jun 27, 2021 at 7:31 PM Jakub Wiśniewski @.***> wrote:

Protected vector must be some categorical variable for example: c("bug_type_1", "bug_type_2",...) and privileged vector must be a value in this vector eg. "bug_type_1". So if you have numerical types of bugs, you may change it to some character vector. But the privileged parameter must be a value in the vector, if it is not met fairness_check won't work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ModelOriented/fairmodels/issues/40#issuecomment-869199062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2NYX3GKJXVKXISKYYLTU5OANANCNFSM47MI3KVQ .

jakwisn commented 3 years ago

a) More like input, and it was intended not to be equal to target variable, but if it has more values and gives you insight than why not. b) yes c) (no) default type is factor, but using any vector-type should be fine, they will be converted to factor. Here is an example

library(fairmodels)
library(DALEX)
data("german")
head(german)

lm_model <- glm(Risk~.,            # Predicting Risk
                data = german,
                family=binomial()) # With Logistic Regression

y_numeric <- as.numeric(german$Risk) -1
explainer_lm <- DALEX::explain(lm_model, data = german, y = y_numeric)

# changing protected to numerical 
prot <- ifelse(german$Sex == 'male', 1, 0)

# privileged should be factor/character
privileged <- '1'

# make sure that this will be true:
# (interlally protected is changed to factor)
privileged %in% as.factor(prot)

# if so you are good to go
fobject <- fairness_check(explainer_lm,
                          protected = prot,
                          privileged = privileged) # privileged = `1`

plot(fobject)
Nehagupta90 commented 3 years ago

Thank you Jakub, now I understand your points. Issue should be closed.

Warm regards

On Sun, Jun 27, 2021 at 8:48 PM Jakub Wiśniewski @.***> wrote:

a) More like input, and it was intended not to be equal to target variable, but if it has more values and gives you insight than why not. b) yes c) (no) default type is factor, but using any vector-type should be fine, they will be converted to factor. Here is an example

library(fairmodels) library(DALEX) data("german") head(german)

lm_model <- glm(Risk~., # Predicting Risk data = german, family=binomial()) # With Logistic Regression

y_numeric <- as.numeric(german$Risk) -1 explainer_lm <- DALEX::explain(lm_model, data = german, y = y_numeric)

changing protected to numerical

prot <- ifelse(german$Sex == 'male', 1, 0)

privileged should be factor/character

privileged <- '1'

make sure that this will be true:

(interlally protected is changed to factor)

privileged %in% as.factor(prot)

if so you are good to go

fobject <- fairness_check(explainer_lm, protected = prot, privileged = privileged) # privileged = 1

plot(fobject)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ModelOriented/fairmodels/issues/40#issuecomment-869208575, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2LBIK446BAWPJ3NGNTTU5W7DANCNFSM47MI3KVQ .

jakwisn commented 3 years ago

Thanks for reaching out!