Closed varunnair18 closed 3 years ago
Hi Varun,
The grouping is simply based on parameters with the same name.
per_layer_z, per_param_z are deprecated - so don't worry about them!
I've explained more details in my response to your email thread. Let me know if you have additional questions.
Thanks for your note - will get back to you over email!
In section 4.3 of the paper, it states:
"Grouping for the structured version of diff pruning is based on the matrix/bias vectors (i.e. parameters that belong to the same matrix or bias vector are assumed to be in the same group), which results in 393 groups."
How is that value of 393 computed, and where can we find the code that relates to the creation of those groupings?
I think the per_layer_alpha flag in
run_glue_diffpruning.py
is used to set the groupings on a per layer basis, but I can't see aper_layer_z
andper_layer_z_grad
variable (since there is aper_param_z
andper_param_z_grad.
)