Closed brisker closed 4 years ago
Hi @brisker ,
Sorry for the late response.
You're correct, this yaml file is the configuration for the compress_classifier.py
script, and the command line would be exactly as you've specified.
Let us know if you have more questions.
Cheers, Lev
@levzlotnik
if we apply PostTrainQuant mode quantization by using /distiller-master/examples/quantization/post_train_quant/stats/resnet18_quant_stats.yaml,
the code will read all the activation statistical data such as avg-max, abs-min etc. But I can not find where you use them, in the file range_linear.py
? Can you tell me where did you use the data in examples/quantization/post_train_quant/stats/resnet18_quant_stats.yaml ?
Hi @brisker,
You can see example invocations using resnet50_quant_stats.yaml
here. You can use resnet18_quant_stats.yaml
in a similar fashion.
Cheers Neta
@levzlotnik @nzmora If I want to quantize the input and output of MaxPooling layer in dorefa-net, do I have to write a new nn.Module named DorefaMaxPooling?
Hi @brisker,
(Note that I edited your last comment to remove irrelevant people you tagged)
To go back to your original question, there could be some confusion I'd like to clear up. If you notice, the yaml file is named "alexnet_bn". That is to say, the intention is to run this not on the original AlexNet, but on a modified AlexNet with batch norm layers. We have it implemented here.
The reason for this is that the implementation from the DoReFa authors used this model - see here.
So in the command line, you should actually use -a alexnet_bn
. (There's nothing preventing you from running it on the "vanilla" AlexNet, but the settings in the YAML file were meant to fit the settings used in the reference DoReFa implementation.
In addition, the reference implementation used Adam instead of SGD, and indeed in my experiments I saw Adam gives better results. Our sample uses SGD and it's not configurable, so one needs to edit the code in order to use Adam.
All of this wasn't detailed in the yaml files - that's my fault. I pushed updates to both the base FP32 yaml and the DoReFa yaml with details on how to run it and the results I got. Please check those out.
Regarding your question on MaxPool - In general the answer is yes, you should replace MaxPool with something that does quant --> maxpool --> quant. Then you define a function that will create this new module and return it, and then add that function to the "replacement factory". Similar to what we do with ReLU in DorefaQuantizer.init()
:
Closing due to inactivity. Please reopen if needed.
@nzmora @levzlotnik @guyjacob If my model is like a->b->c->d,->e, and I want to add some loss functions onto the outputs of three layers -> layer b, layer c, layer d, this loss function can be something like MSELoss(output_b - label_b, .etc..)and I hope that "b,c,d" can be configured in the yaml file. So how can I implement this?
Hi @brisker ,
You could define "regularization" policies for these layers that will calculate the MSELoss
for each of the layers.
Of course these aren't actually regularizations, but since the implementation would require you to apply the additional loss on each minibatch - the API for regularization policies is just the thing you need.
To add it - you create a new class that inherits from distiller.regularization.regularizer._Regularizer
base class, and implement your details. Also insert it into distiller.regularization.__init__
imports so it's visible to compress_classifier.py
. After that - you'll be able to use it from your yaml files.
Cheers, Lev
@levzlotnik Still I do not know how to get the middle layers' outputs, before I add them into the loss functions. I generally know I can use hooks, but how to add hooks to particular layers according to the yaml file?
Hi @brisker , You can add a hook by getting the module itself from name:
modules_dict = dict(model.named_modules())
your_layer = modules_dict[your_layer_name]
your_layer.register_forward_hook(your_hook)
Where your_hook
may be sending an output to your custom "regularizer".
@levzlotnik I know the code should be like this , but I do not know how to involve the return values of hook functions into the model's total loss functions
@levzlotnik I know the code should be like this , but I do not know how to involve the return values of hook functions into the model's total loss functions
If I want to train a dorefa-Net quantization on alexnet , is the command line like this ?
python compress_classifier.py -a alexnet /ImageNet_Share/ --compress=/distiller/examples/quantization/quant_aware_train/alexnet_bn_dorefa.yaml
We do not need to modify the code incompress_classifier.py
to do this training?