Trusted-AI / AIF360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
https://aif360.res.ibm.com/
Apache License 2.0
2.46k stars 840 forks source link

Restricting AdversarialDebiasing's trainable variables to current scope #255

Closed mfeffer closed 2 years ago

mfeffer commented 3 years ago

When attempting to use AdversarialDebiasing multiple times and resetting the tf graph or session is not an option (i.e. when using multiple instances as part of an ensemble), subsequent calls to fit() after fitting the first model will crash because the weight and bias tensors from the first model are retrieved as trainable variables and trying to compute their gradients via classifier_opt.compute_gradients(pred_labels_loss, var_list=classifier_vars) returns None tensors, even if those tensors are not necessarily in the current scope. More concretely, with scope adversarial_debiasing_1627063453.788637 used to train the first model and adversarial_debiasing_1627063456.061836 used to train the second, the gradients for the first run are

[(<tf.Tensor 'adversarial_debiasing_1627063453.788637/gradients_1/adversarial_debiasing_1627063453.788637/classifier_model/MatMul_grad/tuple/control_dependency_1:0' shape=(10, 200) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/W1:0' shape=(10, 200) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063453.788637/gradients_1/adversarial_debiasing_1627063453.788637/classifier_model/add_grad/tuple/control_dependency_1:0' shape=(200,) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/b1:0' shape=(200,) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063453.788637/gradients_1/adversarial_debiasing_1627063453.788637/classifier_model/MatMul_1_grad/tuple/control_dependency_1:0' shape=(200, 1) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/W2:0' shape=(200, 1) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063453.788637/gradients_1/adversarial_debiasing_1627063453.788637/classifier_model/add_1_grad/tuple/control_dependency_1:0' shape=(1,) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/b2:0' shape=(1,) dtype=float32_ref>)]

and the ones for the second are

[(None, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/W1:0' shape=(10, 200) dtype=float32_ref>), (None, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/b1:0' shape=(200,) dtype=float32_ref>), (None, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/W2:0' shape=(200, 1) dtype=float32_ref>), (None, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/b2:0' shape=(1,) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063456.061836/gradients_1/adversarial_debiasing_1627063456.061836/classifier_model/MatMul_grad/tuple/control_dependency_1:0' shape=(10, 200) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063456.061836/classifier_model/W1:0' shape=(10, 200) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063456.061836/gradients_1/adversarial_debiasing_1627063456.061836/classifier_model/add_grad/tuple/control_dependency_1:0' shape=(200,) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063456.061836/classifier_model/b1:0' shape=(200,) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063456.061836/gradients_1/adversarial_debiasing_1627063456.061836/classifier_model/MatMul_1_grad/tuple/control_dependency_1:0' shape=(200, 1) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063456.061836/classifier_model/W2:0' shape=(200, 1) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063456.061836/gradients_1/adversarial_debiasing_1627063456.061836/classifier_model/add_1_grad/tuple/control_dependency_1:0' shape=(1,) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063456.061836/classifier_model/b2:0' shape=(1,) dtype=float32_ref>)]

Note the None tensors corresponding to variables in scope adversarial_debiasing_1627063453.788637 that appear in the second list.

The fix in this PR is to make sure classifier_vars and adversary_vars do not have variables beyond the current scope, which can be done by passing scope=self.scope_name as parameter to the tf.trainable_variable() calls in each case.