Closed Aafiya-H closed 5 months ago
Did you get the training script? If yes, could you please share the training sciprts? Thanks a lot
Did you get the training script? If yes, could you please share the training sciprts? Thanks a lot
Hi, did you get the training script?
Hi, I am adding MoD layers on top of my current encoder, so I am not sure how much of this might apply. I modified the implementation of apply_mod_to_hf
and used the hugging face trainer.
class BaseModelEncoderMOD(BaseModelEncoder):
def __init__(self,encoder,capacity,num_mod_layers,state_dict=None):
super().__init__(encoder.config)
new_layers = nn.ModuleList([copy.deepcopy(MoD(self.capacity, self.layers[0])) for i in range(num_mod_layers)])
self.layers.extend(new_layers)
def custom_weight_init(m):
init.xavier_normal_(m.weight) # Use Xavier normal initialization for weights
if m.bias is not None:
init.constant_(m.bias, 0)
def copy_common_weights(source_state_dict, target_state_dict):
for name, param in source_state_dict.items():
if name in target_state_dict and target_state_dict[name].size() == param.size():
target_state_dict[name].copy_(param)
def apply_mod_to_hf(model, capacity=1, num_mod_layers = 0,enabled: bool = True):
if not enabled:
return model
num_layers = len(model.encoder.layers)
state_dict = model.encoder.state_dict()
encoder = BaseModelEncoderMOD(model.encoder,capacity,num_mod_layers,state_dict)
for layer in encoder.layers[num_layers:]:
block = layer.block
block.self_attn.k_proj.apply(custom_weight_init)
block.self_attn.v_proj.apply(custom_weight_init)
block.self_attn.q_proj.apply(custom_weight_init)
block.self_attn.out_proj.apply(custom_weight_init)
block.fc1.apply(custom_weight_init)
block.fc2.apply(custom_weight_init)
model.encoder = encoder
copy_common_weights(state_dict,model.encoder.state_dict())
return model
Did you get the training script? If yes, could you please share the training sciprts? Thanks a lot
Hi, did you get the training script?
you can try this https://github.com/hiyouga/LLaMA-Factory/blob/v0.6.3/examples/extras/MoD/sft.sh
Hello, Thank you so much for this amazing work! I was wondering if you could provide the training scripts for the MoD models?