Wrong baseline models to measure speedup against - Githubissues

Tencent / PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.

https://pocketflow.github.io

Other

2.79k stars 490 forks source link

Wrong baseline models to measure speedup against #123

Open dhingratul opened 5 years ago

dhingratul commented 5 years ago

Describe the bug A clear and concise description of what the bug is. I ran two distinct experiments, one on uniform quantization, and one on channel pruning with the same resnet model, however, the outputs from both this optimization produced different model_original against which speedup is measured. The one from uniform quantization runs @ 25ms, and one from channel pruning @ 20ms. How are you measuring baseline ? To Reproduce Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context Add any other context about the problem here.

jiaxiang-wu commented 5 years ago

@dhingratul We need some details to reproduce your issue.

Which dataset are you using, CIFAR-10 or ImageNet?
What is the depth of your ResNet model?
Are you using *.pb models to measure the time consumption?
Which device are you using for inference computation, CPU or GPU?
What is the mini-batch size?

dhingratul commented 5 years ago

ResNet
resnet_at_ilsvrc12_run.py
.pb
GPU
Whatever is defined here , run_local.sh nets/resnet_at_ilsvrc12_run.py

jiaxiang-wu commented 5 years ago

Got it. We are reproducing your issue.

jiaxiang-wu commented 5 years ago

@dhingratul Sorry, we cannot reproduce your issue. According to our results (see the benchmark code for inference speed in PR #136), the model_original.pb generated by export_chn_pruned_tflite_model.py costs 3.23ms and the one generated by export_quant_tflite_model.py costs 3.34ms, which basically are the same.

Some notes:

We set batch size to 1. Is this the same on your side?
The inference time (~3.3ms) is much shorter than 20-25ms as mentioned in your comment.
Can you test your *.pb models with our benchmark code in #136?

dhingratul commented 5 years ago

Yes, The only difference I see is that you are generating data with all zeros, and I generate it with np.random.rand. I use batch size and average over 1000 runs, leaving out the 1st inference(you use 100) because that time is always inflated due to GPU warmup. The difference in inference times could be because of the different GPU architecture. I am more interested in the percent speedup rather than actual numbers.
The 20ms inferences were on an older gen GPU For 1080Ti, I see the model_dcp_eval/model_original.pb run at ~5ms, and model_uqtf_eval/model_original.pb run at ~8ms.
The bug is reproducible with your benchmark code as well

jiaxiang-wu commented 5 years ago

@dhingratul After changing np.zeros to np.random.rand, I still cannot reproduce your issue. Following is my results: model_original.pb (chn-pruned): 3.48ms 3.41ms 3.27ms 3.26ms model_original.pb (quant) 3.47ms 3.41ms 3.38ms 3.24ms P.S.: I am using a P40 GPU.

Can you post your *.pb model files, so I can test on them?

dhingratul commented 5 years ago

@jiaxiang-wu DCP models https://drive.google.com/open?id=1NijcwZ-Cwd-Nqa73E2D5nTL_X2yhB32a UQTF Model https://drive.google.com/open?id=1LIYaJZclwBllEThoWZScj23Sq4_LkUxx

jiaxiang-wu commented 5 years ago

Thanks a lot. We are looking into this issue.

Tingelam commented 5 years ago

Got same results as @dhingratul , details:

use the same export_quant_tflite_model.py to generate *.pb file, the --model_file is from models_eval(full_prec_model) and models_uqtf_eval(quant_model).
For GPU.
model_original.pb (full_prec_model): 5.15ms,4.84ms,4.68ms,5.07ms
model_original.pb (quant): 6.53ms,6.24ms,6.60ms,6.77ms.
For CPU.
model_original.tflite (full_prec_model): 66.77ms,68.41ms.
model_original.tflite (quant): 88.07ms,85.29ms.
model_quantized.tflite(quant): 187.28ms,190.10ms.
Backbone: Resnet18 GPU：1080Ti CPU：E5-2650 v4 OS：Ubuntu14.04
Thanks a lot.