In initiate_local_training, self_params_dict_new is recorded, but these parameters are already detach(), and only the transient parameter state is recorded. Therefore new_adapter_weight, which is saved to the appropriate path in terminate_local_training for aggregation, is an untrained parameter.
def initiate_local_training(self):
self.model.config.use_cache = False
self.params_dict_old = copy.deepcopy(
OrderedDict((name, param.detach()) for name, param in self.model.named_parameters() if
"default" in name))
self.params_dict_new = OrderedDict((name, param.detach()) for name, param in self.model.named_parameters() if
"default" in name)
self.model.state_dict = (
lambda instance, *_, **__: get_peft_model_state_dict(
instance, self.params_dict_new, "default"
)
).__get__(self.model, type(self.model))
In initiate_local_training, self_params_dict_new is recorded, but these parameters are already detach(), and only the transient parameter state is recorded. Therefore new_adapter_weight, which is saved to the appropriate path in terminate_local_training for aggregation, is an untrained parameter.