Difference between end_pruning_step and policy_end_step

IntelLabs / Model-Compression-Research-Package

A library for researching neural networks compression and acceleration methods.

Apache License 2.0

136 stars 24 forks source link

Difference between end_pruning_step and policy_end_step #4

Closed eldarkurtic closed 2 years ago

eldarkurtic commented 2 years ago

Hi, Could you please clarify the difference between end_pruning_step and policy_end_step in the pruning config file (for example: https://github.com/IntelLabs/Model-Compression-Research-Package/blob/main/examples/transformers/language-modeling/config/iterative_unstructured_magnitude_90_config.json)?

ofirzaf commented 2 years ago

Hi,

Sure.

In the interval [begin_pruning_step, end_pruning_step] defines when the scheduler allows the pruning masks to update and change the pruning pattern. Outside of this interval the masks will remain constant and the pruning pattern will remain the same regardless of the weights' magnitudes.

In the interval [policy_begin_step, policy_end_step] defines the interval of the pruning policy. The pruning policy defines how we increase the sparsity during training from the initial sparsity to the final sparsity in the assigned interval. For example, in this library we strickly use the policy that was introduced in To prune, or not to prune: exploring the efficacy of pruning for model compression:

$s_t=s_f+\left(s_i - s_f\right)\left(1-\frac{t-t_0}{t_1-t_0}\right)^3$

where

t = current time step
t₀ = policy begin step
t₁ = policy end step
s_t = current sparsity ratio
s_i = initial sparsity ratio
s_f = final sparsity ratio

eldarkurtic commented 2 years ago

If we do a run with 100k steps and we specify the following pruning config:

begin_pruning_step: 0
end_pruning_step: 80k
policy_begin_step: 0
policy_end_step: 50k

we would get the following:

in [0, 50k] model's sparsity would go from initial sparsity to final sparsity following the pruning policy

But I'm not sure what would happen with model and its sparsity in [50k, 80k] and [80k, 100k] ranges? Since pruning policy finished at step=50k and at that point our model has the final sparsity mask, why do we need the end_pruning_step at 80k?

ofirzaf commented 2 years ago

In the interval [50k, 80k] the sparsity ratio of the model have reached its final value, however the sparsity masks will continue to update every pruning_frequency steps, changing the sparsity pattern of the model according to the highest magnitude weights currently in the model.

eldarkurtic commented 2 years ago

Is this part described somewhere in the paper (just checking if I've missed it)? If not, could you please clarify a bit more how the sparsity mask changes in the [50k, 80k] range?

How do you pick which masked-weights to re-introduce?
Are they initialized to zero when re-introduced?

ofirzaf commented 2 years ago

This is not described in the paper, however, this is common practice in magnitude pruning and I think it is described in To prune, or not to prune: exploring the efficacy of pruning for model compression which we refer to in our paper.

The weights values are kept as is even when they are masked out. When a weight's magnitude that is not masked drops lower than a weight's magnitude that is masked, the masked weight with the higher magnitude will replace the unmasked weight with the lower magnitude when updating the sparsity mask.
When reintroduced the weights will keep their last recorded value.

eldarkurtic commented 2 years ago

Okay, thanks a lot for clarification :)