Closed defunct-ff closed 3 years ago
... and what were the evaluation results after pruning (if results drop drastically with little pruning, then all weights are important. Else the learned weights are sparse)?
It looks like dynamic_quantization does not affect the accuracy (in a way that I can see) - 8.153631990087488 error for pants, male. Inference time gets a bit shorter - 1.1 to 0.8.
However loading time goes up by double for my machine - 11s to 24s.
How can loading time goes up when the model is smaller now? And did you try with stronger quantization?
DISCLAIMER: I am deploying Tailornet for academic dissertation
1) I have applied a simple dynamic quantization which reduced the model sizes to 1/4. How do I evaluate the accuracy vs original model weights?
2) Applying any sort of PyTorch pruning even with very very small amount results in all weights to be removed. Any idea why this happens?