how to add middle layers' activation loss functions?

brisker commented 5 years ago

If I want to train a dorefa-Net quantization on alexnet , is the command line like this ? python compress_classifier.py -a alexnet /ImageNet_Share/ --compress=/distiller/examples/quantization/quant_aware_train/alexnet_bn_dorefa.yaml We do not need to modify the code in compress_classifier.py to do this training?

levzlotnik commented 5 years ago

Hi @brisker ,

Sorry for the late response. You're correct, this yaml file is the configuration for the compress_classifier.py script, and the command line would be exactly as you've specified.

Let us know if you have more questions.

Cheers, Lev

brisker commented 5 years ago

@levzlotnik if we apply PostTrainQuant mode quantization by using /distiller-master/examples/quantization/post_train_quant/stats/resnet18_quant_stats.yaml, the code will read all the activation statistical data such as avg-max, abs-min etc. But I can not find where you use them, in the file range_linear.py? Can you tell me where did you use the data in examples/quantization/post_train_quant/stats/resnet18_quant_stats.yaml ?

nzmora commented 5 years ago

Hi @brisker,

You can see example invocations using resnet50_quant_stats.yaml here. You can use resnet18_quant_stats.yaml in a similar fashion.

Cheers Neta

brisker commented 5 years ago

@levzlotnik @nzmora If I want to quantize the input and output of MaxPooling layer in dorefa-net, do I have to write a new nn.Module named DorefaMaxPooling?

guyjacob commented 5 years ago

Hi @brisker,

(Note that I edited your last comment to remove irrelevant people you tagged)

To go back to your original question, there could be some confusion I'd like to clear up. If you notice, the yaml file is named "alexnet_bn". That is to say, the intention is to run this not on the original AlexNet, but on a modified AlexNet with batch norm layers. We have it implemented here. The reason for this is that the implementation from the DoReFa authors used this model - see here. So in the command line, you should actually use -a alexnet_bn. (There's nothing preventing you from running it on the "vanilla" AlexNet, but the settings in the YAML file were meant to fit the settings used in the reference DoReFa implementation. In addition, the reference implementation used Adam instead of SGD, and indeed in my experiments I saw Adam gives better results. Our sample uses SGD and it's not configurable, so one needs to edit the code in order to use Adam.

All of this wasn't detailed in the yaml files - that's my fault. I pushed updates to both the base FP32 yaml and the DoReFa yaml with details on how to run it and the results I got. Please check those out.

Regarding your question on MaxPool - In general the answer is yes, you should replace MaxPool with something that does quant --> maxpool --> quant. Then you define a function that will create this new module and return it, and then add that function to the "replacement factory". Similar to what we do with ReLU in DorefaQuantizer.init():

https://github.com/NervanaSystems/distiller/blob/e65ec8fce890049fb421aa7d9d32cad5b075cd87/distiller/quantization/clipped_linear.py#L169-L177

nzmora commented 4 years ago

Closing due to inactivity. Please reopen if needed.

brisker commented 4 years ago

@nzmora @levzlotnik @guyjacob If my model is like a->b->c->d,->e, and I want to add some loss functions onto the outputs of three layers -> layer b, layer c, layer d, this loss function can be something like MSELoss(output_b - label_b, .etc..)and I hope that "b,c,d" can be configured in the yaml file. So how can I implement this?

levzlotnik commented 4 years ago

Hi @brisker ,

You could define "regularization" policies for these layers that will calculate the MSELoss for each of the layers.
Of course these aren't actually regularizations, but since the implementation would require you to apply the additional loss on each minibatch - the API for regularization policies is just the thing you need.
To add it - you create a new class that inherits from distiller.regularization.regularizer._Regularizer base class, and implement your details. Also insert it into distiller.regularization.__init__ imports so it's visible to compress_classifier.py. After that - you'll be able to use it from your yaml files.

Cheers, Lev

brisker commented 4 years ago

@levzlotnik Still I do not know how to get the middle layers' outputs, before I add them into the loss functions. I generally know I can use hooks, but how to add hooks to particular layers according to the yaml file?

levzlotnik commented 4 years ago

Hi @brisker , You can add a hook by getting the module itself from name:

modules_dict = dict(model.named_modules())
your_layer = modules_dict[your_layer_name]
your_layer.register_forward_hook(your_hook)

Where your_hook may be sending an output to your custom "regularizer".

brisker commented 4 years ago

@levzlotnik I know the code should be like this , but I do not know how to involve the return values of hook functions into the model's total loss functions

brisker commented 4 years ago

@levzlotnik I know the code should be like this , but I do not know how to involve the return values of hook functions into the model's total loss functions

IntelLabs / distiller

how to add middle layers' activation loss functions? #340