fairlearn / fairlearn

A Python package to assess and improve fairness of machine learning models.
https://fairlearn.org
MIT License
1.91k stars 412 forks source link

data can be loaded only once #1210

Open anilsh opened 1 year ago

anilsh commented 1 year ago

When I try to run the GridSearch twice or ExponentiatedGradient after GridSearch, the constraints returns the following error.

AssertionError: data can be loaded only once

Full stack trace is:

File ~\OneDrive - EY\fairness2\FSRM-shEYzam-repos\shazamlib\group_fairness.py:313, in GroupFairness.reduction_grid_search(self, base_model)
    311 # train model 
    312 print ('Training base model on specified constraints..')
--> 313 model_gridsearch.fit(self.X_train, self.y_train, sensitive_features=self.S_train)
    314 self.inprocess['model'] = model_gridsearch
    316 # make predictions

File ~\Anaconda3\envs\sheyzam-fairness-env\lib\site-packages\fairlearn\reductions\_grid_search\grid_search.py:143, in GridSearch.fit(self, X, y, **kwargs)
    141 # Prep the parity constraints and objective
    142 logger.debug("Preparing constraints and objective")
--> 143 self.constraints.load_data(X, y, **kwargs)
    144 objective = self.constraints.default_objective()
    145 objective.load_data(X, y, **kwargs)

File ~\Anaconda3\envs\sheyzam-fairness-env\lib\site-packages\fairlearn\reductions\_moments\utility_parity.py:333, in DemographicParity.load_data(self, X, y, sensitive_features, control_features)
    331 base_event = pd.Series(data=_ALL, index=y_train.index)
    332 event = _merge_event_and_control_columns(base_event, cf_train)
--> 333 super().load_data(X, y_train, event=event, sensitive_features=sf_train)

File ~\Anaconda3\envs\sheyzam-fairness-env\lib\site-packages\fairlearn\reductions\_moments\utility_parity.py:146, in UtilityParity.load_data(self, X, y, sensitive_features, event, utilities)
    123 def load_data(
    124     self,
    125     X,
   (...)
    130     utilities=None,
    131 ):
    132     """Load the specified data into this object.
    133 
    134     This adds a column `event` to the `tags` field.
   (...)
    144 
    145     """
--> 146     super().load_data(X, y, sensitive_features=sensitive_features)
    147     self.tags[_EVENT] = event
    148     if utilities is None:

File ~\Anaconda3\envs\sheyzam-fairness-env\lib\site-packages\fairlearn\reductions\_moments\moment.py:42, in Moment.load_data(self, X, y, sensitive_features)
     30 def load_data(self, X, y: pd.Series, *, sensitive_features: pd.Series = None):
     31     """Load a set of data for use by this object.
     32 
     33     Parameters
   (...)
     40         The sensitive feature vector (default None)
     41     """
---> 42     assert self.data_loaded is False, "data can be loaded only once"
     43     if sensitive_features is not None:
     44         assert isinstance(sensitive_features, pd.Series)

AssertionError: data can be loaded only once
hildeweerts commented 1 year ago

Hi @anilsh. Please follow the instructions from the bug report template to print all dependencies.

romanlutz commented 1 year ago

This is by design AFAIK. If we allowed loading multiple times you could have something like

constraint = DemographicParity()
eg = ExponentiatedGradient(constraints=constraint, ...)
eg.fit(...)  # calls load_data and sets fields internal to the moment
constraint.load_data(different_data)

In other words, one could mess up the constraint object in weird ways. I can see two changes we could make

  1. Perhaps load_data should be _load_data to avoid giving people the impression that it's something they could use (?), and
  2. perhaps we should clone the constraints object before using it internally. That way, we could pass the same constraints object to several different mitigators without corrupting it in the process.

@MiroDudik wdyt?