A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
When attempting to use AdversarialDebiasing multiple times and resetting the tf graph or session is not an option (i.e. when using multiple instances as part of an ensemble), subsequent calls to fit() after fitting the first model will crash because the weight and bias tensors from the first model are retrieved as trainable variables and trying to compute their gradients via classifier_opt.compute_gradients(pred_labels_loss, var_list=classifier_vars) returns None tensors, even if those tensors are not necessarily in the current scope. More concretely, with scope adversarial_debiasing_1627063453.788637 used to train the first model and adversarial_debiasing_1627063456.061836 used to train the second, the gradients for the first run are
Note the None tensors corresponding to variables in scope adversarial_debiasing_1627063453.788637 that appear in the second list.
The fix in this PR is to make sure classifier_vars and adversary_vars do not have variables beyond the current scope, which can be done by passing scope=self.scope_name as parameter to the tf.trainable_variable() calls in each case.
When attempting to use
AdversarialDebiasing
multiple times and resetting the tf graph or session is not an option (i.e. when using multiple instances as part of an ensemble), subsequent calls tofit()
after fitting the first model will crash because the weight and bias tensors from the first model are retrieved as trainable variables and trying to compute their gradients viaclassifier_opt.compute_gradients(pred_labels_loss, var_list=classifier_vars)
returnsNone
tensors, even if those tensors are not necessarily in the current scope. More concretely, with scopeadversarial_debiasing_1627063453.788637
used to train the first model andadversarial_debiasing_1627063456.061836
used to train the second, the gradients for the first run are[(<tf.Tensor 'adversarial_debiasing_1627063453.788637/gradients_1/adversarial_debiasing_1627063453.788637/classifier_model/MatMul_grad/tuple/control_dependency_1:0' shape=(10, 200) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/W1:0' shape=(10, 200) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063453.788637/gradients_1/adversarial_debiasing_1627063453.788637/classifier_model/add_grad/tuple/control_dependency_1:0' shape=(200,) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/b1:0' shape=(200,) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063453.788637/gradients_1/adversarial_debiasing_1627063453.788637/classifier_model/MatMul_1_grad/tuple/control_dependency_1:0' shape=(200, 1) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/W2:0' shape=(200, 1) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063453.788637/gradients_1/adversarial_debiasing_1627063453.788637/classifier_model/add_1_grad/tuple/control_dependency_1:0' shape=(1,) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/b2:0' shape=(1,) dtype=float32_ref>)]
and the ones for the second are
[(None, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/W1:0' shape=(10, 200) dtype=float32_ref>), (None, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/b1:0' shape=(200,) dtype=float32_ref>), (None, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/W2:0' shape=(200, 1) dtype=float32_ref>), (None, <tf.Variable 'adversarial_debiasing_1627063453.788637/classifier_model/b2:0' shape=(1,) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063456.061836/gradients_1/adversarial_debiasing_1627063456.061836/classifier_model/MatMul_grad/tuple/control_dependency_1:0' shape=(10, 200) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063456.061836/classifier_model/W1:0' shape=(10, 200) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063456.061836/gradients_1/adversarial_debiasing_1627063456.061836/classifier_model/add_grad/tuple/control_dependency_1:0' shape=(200,) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063456.061836/classifier_model/b1:0' shape=(200,) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063456.061836/gradients_1/adversarial_debiasing_1627063456.061836/classifier_model/MatMul_1_grad/tuple/control_dependency_1:0' shape=(200, 1) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063456.061836/classifier_model/W2:0' shape=(200, 1) dtype=float32_ref>), (<tf.Tensor 'adversarial_debiasing_1627063456.061836/gradients_1/adversarial_debiasing_1627063456.061836/classifier_model/add_1_grad/tuple/control_dependency_1:0' shape=(1,) dtype=float32>, <tf.Variable 'adversarial_debiasing_1627063456.061836/classifier_model/b2:0' shape=(1,) dtype=float32_ref>)]
Note the
None
tensors corresponding to variables in scopeadversarial_debiasing_1627063453.788637
that appear in the second list.The fix in this PR is to make sure
classifier_vars
andadversary_vars
do not have variables beyond the current scope, which can be done by passingscope=self.scope_name
as parameter to thetf.trainable_variable()
calls in each case.