astramind-ai / Mixture-of-depths

Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
129 stars 7 forks source link

Training scripts for the MoD models #6

Closed Aafiya-H closed 5 months ago

Aafiya-H commented 5 months ago

Hello, Thank you so much for this amazing work! I was wondering if you could provide the training scripts for the MoD models?

Zkli-hub commented 5 months ago

Did you get the training script? If yes, could you please share the training sciprts? Thanks a lot

Mi5sssss commented 4 months ago

Did you get the training script? If yes, could you please share the training sciprts? Thanks a lot

Hi, did you get the training script?

Aafiya-H commented 4 months ago

Hi, I am adding MoD layers on top of my current encoder, so I am not sure how much of this might apply. I modified the implementation of apply_mod_to_hf and used the hugging face trainer.

class BaseModelEncoderMOD(BaseModelEncoder):
    def __init__(self,encoder,capacity,num_mod_layers,state_dict=None):
        super().__init__(encoder.config)
        new_layers = nn.ModuleList([copy.deepcopy(MoD(self.capacity, self.layers[0])) for i in range(num_mod_layers)])
        self.layers.extend(new_layers)

def custom_weight_init(m):
    init.xavier_normal_(m.weight)  # Use Xavier normal initialization for weights
    if m.bias is not None:
        init.constant_(m.bias, 0) 

def copy_common_weights(source_state_dict, target_state_dict):    
    for name, param in source_state_dict.items():
        if name in target_state_dict and target_state_dict[name].size() == param.size():
            target_state_dict[name].copy_(param)

def apply_mod_to_hf(model, capacity=1, num_mod_layers = 0,enabled: bool = True):
    if not enabled:
        return model
    num_layers = len(model.encoder.layers)
    state_dict = model.encoder.state_dict()
    encoder = BaseModelEncoderMOD(model.encoder,capacity,num_mod_layers,state_dict)

    for layer in encoder.layers[num_layers:]:
        block = layer.block
        block.self_attn.k_proj.apply(custom_weight_init)
        block.self_attn.v_proj.apply(custom_weight_init)
        block.self_attn.q_proj.apply(custom_weight_init)
        block.self_attn.out_proj.apply(custom_weight_init)

        block.fc1.apply(custom_weight_init)
        block.fc2.apply(custom_weight_init)

    model.encoder = encoder
    copy_common_weights(state_dict,model.encoder.state_dict())
    return model
ZHQSimon commented 4 months ago

Did you get the training script? If yes, could you please share the training sciprts? Thanks a lot

Hi, did you get the training script?

you can try this https://github.com/hiyouga/LLaMA-Factory/blob/v0.6.3/examples/extras/MoD/sft.sh