Training a custom model

abhishektyaagi commented 6 months ago

Hi, Thank you for your work and for making it publicly available.

I was wondering if it is possible for me to use SRigl to train my own n-layer network which has MLP layers using the scripts provided here? (P.S: I am not worried about the accuracy of SRigl with MLP. I am more curious about whether it can be used for MLPs or not)

mklasby commented 6 months ago

Hi @abhishektyaagi,

Yes, you should be able to use this repository for your needs. The basic workflow is as follows:

from rigl_torch.rigl_constant_fan import RigLConstFanScheduler

model = get_model()  # your MLP
optimizer = torch.optim.SGD(params=model.parameters(), *args, **kwargs)  # SGD or AdamW supported currently

# Recommended kwargs
scheduler_kwargs = dict(
    dense_allocation = 0.1,  # (1-sparsity) up to you to set 
    T_end = int(len(data_loader)*num_epochs*0.75),   # optim step number at which to stop mask mutation, 75% is what we used. 
    # Will need to adjust for distributed training or grad accumulation, see https://github.com/calgaryml/condensed-sparsity/blob/main/src/rigl_torch/utils/rigl_utils.py#L239 for an example of how this can be calculated for more complex training runs. 
    dynamic_ablation = True , # ablate low saliency neurons
    min_salient_weights_per_neuron = 0.3,  # 30% of sparse weights must be salient or else neuron is pruned
    no_ablation_module_names = list(model.named_modules())[-1][0],  # Important to not ablate your last layer as this would remove entire classes from consideration. 
)
scheduler = RigLConstFanScheduler(model, optimizer, **scheduler_kwargs)

# train loop
for data, labels in data_loader:
    ....
    loss.backward()
    optimizer.step()
    scheduler()  # we use __call__ to step the SRigL scheduler, returns True if mask was modified, False if not
    optimizer.zero_grad()

I also suggest you checkout sparsimony, which currently has RigL and SET implemented for PyTorch and I am working on implementing SRigL in that repo in a much more modular form that should be easier to use. sparsimony is based on the torch.ao effort which is currently under deverlopment in the pytorch repo.

Please let us know if this works out for your use case. Any feedback you have is greatly appreciated.

abhishektyaagi commented 6 months ago

Thank you for your reply. I will try this out.

Also, if I understand correctly, there is no provision in the current codebase if I want to try out NVIDIAs 2:4 sparsity structure, right?

mklasby commented 5 months ago

Also, if I understand correctly, there is no provision in the current codebase if I want to try out NVIDIAs 2:4 sparsity structure, right?

Correct! Let me know how it goes if you do extend the repo :)

calgaryml / condensed-sparsity

Training a custom model #75