Channel Pruning should reCreatePruner in channel pruning

Tencent / PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.

https://pocketflow.github.io

Other

2.78k stars 490 forks source link

Channel Pruning should reCreatePruner in channel pruning #258

Open Nankaiming opened 5 years ago

Nankaiming commented 5 years ago

在DDPG训练完也就是__prune_rl()后，应该再加一个self.create_pruner()吧，如果不加这个，感觉是在RL最后一次的compress上应用新的pruning，这应该不是正解吧！！！感觉还是重新create_pruner()比较好一点。各位大佬看看是不是这样子？屏幕快照 2019-03-21 下午8 56 15

jiaxiang-wu commented 5 years ago

Hi, thanks for your suggestion. Do you have performance comparison results for re-creating the pruner (or not), so that we can confirm that re-creating the pruner is necessary?

Nankaiming commented 5 years ago

I have no test. I only infer from the normal logic though. In the prune_rl code block, it recreate pruner to confirm the previous pruning have no influence to this new. So the last pruning should be divided from the formal training.

jiaxiang-wu commented 5 years ago

You mean this line? https://github.com/Tencent/PocketFlow/blob/8033dd4e93443faa0ab1b60da7f9682648cb9a02/learners/channel_pruning/learner.py#L648

Nankaiming commented 5 years ago

Yes, this is recreate pruner in every trial to get a possible structure. But in line 596 of this file, I suggest it should create_pruner again, not initial the state, because in the last trial in the RL rollout, this must change the W, so In the finetuning process, we should recreate a clean pruner.

jiaxiang-wu commented 5 years ago

Got it. We will test with pruner re-created at the beginning of fine-tuning process and propose a PR if this went well.

Nankaiming commented 5 years ago

Maybe i find another bug, also called the performance need to improvement. In the finetuning process, if we define the 'cp_list_group' small, this means we prune the several layers in the network. But this finetuning don't change the pruner's model's graph, so i think that we don't need finetuning for the several layers. If we want to watch the several layers finetuning acc, the model path should not be set one. @jiaxiang-wu

Nankaiming commented 5 years ago

Another question is in RL agent. The noise prtl has "tdecy" and "adapt". The noise type has "param" and "action". Did it means that there are four combinations for noise. But in the agent train code, the judge condition of "adapt" to update actor_ns has conflict with init_rollout. @jiaxiang-wu

jiaxiang-wu commented 5 years ago

There are only three valid combinations for RL noise:

adaptive parameter noise
action noise with time-decay standard deviation
parameter noise with time-decay standard deviation

https://github.com/Tencent/PocketFlow/blob/master/rl_agents/ddpg/noise.py#L41 https://github.com/Tencent/PocketFlow/blob/master/rl_agents/ddpg/noise.py#L69

Nankaiming commented 5 years ago

OK thanks，but how about the question about the performance of the previous？

Nankaiming commented 5 years ago

Another bug is in the "export_pb_tflite_models" file. In the function "insert_alt_routines", if the data format is "NHWC", the method is replace origin convolution with two convolution, but the first conv changes the data. The right op is extract the used channel.