Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.48k stars 3.3k forks source link

Model Pruning Callback Failing #10835

Open aqibsaeed opened 2 years ago

aqibsaeed commented 2 years ago

Hi,

I am trying to use ModelPruning callback as follows:

    callbacks=[
        ModelPruning(
            pruning_fn="l1_unstructured",
            amount=0.01,
            use_global_unstructured=True,
        )
    ]

but after training for an epoch, the Trainer throws following error, (only happens when using ModelPruning callback):

File "/home/.conda/envs/dummy/lib/python3.8/site-packages/torch/nn/utils/convert_parameters.py", line 77, in _check_param_device
    if param.is_cuda:  # Check if in same GPU
AttributeError: 'bool' object has no attribute 'is_cuda'

I tried pytorch 1.7.0 and 1.9.0 but the issue persist. Any idea what is causing this error?

Thanks.

cc @tchaton @rohitgr7 @carmocca

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

tchaton commented 2 years ago

Dear @aqibsaeed,

Would you mind trying the latest version of PyTorch Lightning?

Best, T.C

Jconn commented 2 years ago

I am also getting this error. I've tried 1.5.10 and 1.6.0.rc1 and a few other 1.5.*, the 1.5.* versions all produce this error and the 1.6.0 rc gives me an OOM fault.

edit: Okay, I understand this error. When not provided parameters to prune, this library tries to add every parameter in the model with name weight or bias. The check for if a layer has a weight or bias tensor is checking to see if that variable exists within the python object. Some pytorch modules have a bias variable that is not a tensor, but rather a bool to indicate whether bias tensors are to be used. pytorch-lighnting tells pytorch to prune a bool, which causes the error.

aminst commented 1 year ago

Hi, I also encountered this issue while using the ModelPruning callback. As @Jconn suggested, I also think this is because of the addition of non-tensor parameters in the pruning stage. The code provided below from the pruning.py file adds all the parameters from all types: https://github.com/Lightning-AI/lightning/blob/8d14554383632f4cdff337b3d8d1b226cabdd1d0/src/pytorch_lightning/callbacks/pruning.py#L450-L453

I think there must be a check of being a Tensor parameter using the isinstance method, so that these non-tensor parameters are not pruned. If my approach looks sound, I will be glad to make a PR and solve this issue.

carmocca commented 1 year ago

Sounds good to me

aminst commented 1 year ago

Sounds good to me

Thanks, I will work on it :+1: