In Sleeper Agent, get_poison_indices() returns self.indices_poison (link); however, the poisoning procedure does not update this variable to the final indices used.
Details
This attack makes max_trialsattempts to poison the data. Each time, a new self.indices_poison is sampled (L201-208). If this attempt is the most successful so far, then best_indices_poison is set to self.indices_poison (L231). But at the end of all the trials, self.indices_poison still points to the indices from the latest attempt, even if the best attempt was earlier.
Expected behavior
At the end of the poisoning loop, after all poisoning attempts are complete (L237), set self.indices_poison = best_indices_poison.
Describe the bug
In Sleeper Agent,
get_poison_indices()
returnsself.indices_poison
(link); however, the poisoning procedure does not update this variable to the final indices used.Details
This attack makes
max_trials
attempts to poison the data. Each time, a newself.indices_poison
is sampled (L201-208). If this attempt is the most successful so far, thenbest_indices_poison
is set toself.indices_poison
(L231). But at the end of all the trials,self.indices_poison
still points to the indices from the latest attempt, even if the best attempt was earlier.Expected behavior
At the end of the poisoning loop, after all poisoning attempts are complete (L237), set
self.indices_poison = best_indices_poison
.