enyac-group / LeGR

CNN channel pruning, LeGR, MorphNet, AMC. Codebase for paper "LeGR: Filter Pruning via Learned Global Ranking"
Apache License 2.0
113 stars 15 forks source link

Those layers with even indices will never never be pruned? #7

Closed pengfeiZhao1993 closed 3 years ago

pengfeiZhao1993 commented 4 years ago

Those layers with even indices will never never be pruned?

def one_shot_lowest_ranking_filters_multi_targets(self, targets): This function gives even layers much more importance since the ranking values are sum of 28 even layers.

Is that what I think? look forward your reply.

printed logs:

Filters left: [(0, 16), (1, 12), (2, 16), (3, 16), (4, 16), (5, 14), (6, 16), (7, 4), (8, 16), (9, 2), (10, 16), (11, 14), (12, 16), (13, 7), (14, 16), (15, 2), (16, 16), (17, 2), (18, 16), (19, 30), (20, 32), (21, 4), (22, 32), (23, 4), (24, 32), (25, 4), (26, 32), (27, 27), (28, 32), (29, 4), (30, 32), (31, 28), (32, 32), (33, 29), (34, 32), (35, 4), (36, 32), (37, 61), (38, 64), (39, 13), (40, 64), (41, 7), (42, 64), (43, 49), (44, 64), (45, 61), (46, 64), (47, 7), (48, 64), (49, 7), (50, 64), (51, 52), (52, 64), (53, 7), (54, 64)] Prunning filters.. Density: 44.941% (0.382M/0.849M) | FLOPs: 46.989% (58.965M/125.486M) Fine tuning to recover from pruning iteration. Generation 130, Step: 0.74, Min Loss: 2.103 Targeting resource usage: 58.98MFLOPs Filters left: [(0, 16), (1, 16), (2, 16), (3, 16), (4, 16), (5, 14), (6, 16), (7, 3), (8, 16), (9, 2), (10, 16), (11, 14), (12, 16), (13, 7), (14, 16), (15, 2), (16, 16), (17, 2), (18, 16), (19, 27), (20, 32), (21, 4), (22, 32), (23, 4), (24, 32), (25, 4), (26, 32), (27, 25), (28, 32), (29, 4), (30, 32), (31, 24), (32, 32), (33, 29), (34, 32), (35, 24), (36, 32), (37, 60), (38, 64), (39, 7), (40, 64), (41, 7), (42, 64), (43, 28), (44, 64), (45, 61), (46, 64), (47, 7), (48, 64), (49, 7), (50, 64), (51, 44), (52, 64), (53, 7), (54, 64)] Prunning filters.. Density: 40.988% (0.348M/0.849M) | FLOPs: 46.974% (58.946M/125.486M) Fine tuning to recover from pruning iteration. Generation 131, Step: 0.74, Min Loss: 2.103 Targeting resource usage: 58.98MFLOPs Filters left: [(0, 16), (1, 16), (2, 16), (3, 16), (4, 16), (5, 14), (6, 16), (7, 3), (8, 16), (9, 2), (10, 16), (11, 14), (12, 16), (13, 7), (14, 16), (15, 2), (16, 16), (17, 2), (18, 16), (19, 27), (20, 32), (21, 4), (22, 32), (23, 4), (24, 32), (25, 4), (26, 32), (27, 25), (28, 32), (29, 4), (30, 32), (31, 24), (32, 32), (33, 29), (34, 32), (35, 30), (36, 32), (37, 61), (38, 64), (39, 7), (40, 64), (41, 7), (42, 64), (43, 13), (44, 64), (45, 61), (46, 64), (47, 7), (48, 64), (49, 7), (50, 64), (51, 46), (52, 64), (53, 7), (54, 64)] Prunning filters.. Density: 39.733% (0.337M/0.849M) | FLOPs: 46.960% (58.928M/125.486M) Fine tuning to recover from pruning iteration. Generation 132, Step: 0.74, Min Loss: 2.103 Targeting resource usage: 58.98MFLOPs Filters left: [(0, 16), (1, 16), (2, 16), (3, 16), (4, 16), (5, 14), (6, 16), (7, 3), (8, 16), (9, 2), (10, 16), (11, 14), (12, 16), (13, 7), (14, 16), (15, 2), (16, 16), (17, 2), (18, 16), (19, 27), (20, 32), (21, 4), (22, 32), (23, 4), (24, 32), (25, 4), (26, 32), (27, 25), (28, 32), (29, 4), (30, 32), (31, 24), (32, 32), (33, 29), (34, 32), (35, 24), (36, 32), (37, 60), (38, 64), (39, 7), (40, 64), (41, 7), (42, 64), (43, 28), (44, 64), (45, 61), (46, 64), (47, 7), (48, 64), (49, 7), (50, 64), (51, 44), (52, 64), (53, 7), (54, 64)] Prunning filters..

RudyChin commented 4 years ago

Hi,

Thank you for your interest in our work. Yes, it is like that for ResNets as we have residual connections and they have to be pruned jointly. You can also take average among the 28 layers so that you see their changes, but it'll take more iterations for EA to find good solutions.

pengfeiZhao1993 commented 4 years ago

I got what you mean. This is a excellent work which inspires me a lot. Thank you for your work. Still, by doing this, it wil result in suboptimal pruning plans due to its pruning strategy. In addition, this kind of strategy limits the channel pruning ratio of a CNNs model (less than 50%).

RudyChin commented 3 years ago

Hi,

I can understand taking the sum makes the algorithm explore less as it is unlikely to prune residual connections. What I was trying to say was that you can change the code to average their respective score and prune. In this way, they are much likely to get pruned, however, it might take more iterations for the algorithm to figure out the good solution.

In my opinion, measuring the average among the residual connection is biased toward pruning residual connection as when we decide to prune, it prunes a lot more filters than other candidates. As a result, we use sum.