dguo98 / DiffPruning

Parameter Efficient Transfer Learning with Diff Pruning
Apache License 2.0
72 stars 9 forks source link

Groupings for Structured Diff Pruning #3

Closed varunnair18 closed 3 years ago

varunnair18 commented 3 years ago

In section 4.3 of the paper, it states:

"Grouping for the structured version of diff pruning is based on the matrix/bias vectors (i.e. parameters that belong to the same matrix or bias vector are assumed to be in the same group), which results in 393 groups."

How is that value of 393 computed, and where can we find the code that relates to the creation of those groupings?

I think the per_layer_alpha flag in run_glue_diffpruning.py is used to set the groupings on a per layer basis, but I can't see a per_layer_z and per_layer_z_grad variable (since there is a per_param_z and per_param_z_grad.)

dguo98 commented 3 years ago

Hi Varun,

The grouping is simply based on parameters with the same name.

per_layer_z, per_param_z are deprecated - so don't worry about them!

I've explained more details in my response to your email thread. Let me know if you have additional questions.

varunnair18 commented 3 years ago

Thanks for your note - will get back to you over email!