Tencent / PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.
https://pocketflow.github.io
Other
2.79k stars 490 forks source link

Wrong baseline models to measure speedup against #123

Open dhingratul opened 5 years ago

dhingratul commented 5 years ago

Describe the bug A clear and concise description of what the bug is. I ran two distinct experiments, one on uniform quantization, and one on channel pruning with the same resnet model, however, the outputs from both this optimization produced different model_original against which speedup is measured. The one from uniform quantization runs @ 25ms, and one from channel pruning @ 20ms. How are you measuring baseline ? To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Add any other context about the problem here.

jiaxiang-wu commented 5 years ago

@dhingratul We need some details to reproduce your issue.

dhingratul commented 5 years ago
  1. ResNet
  2. resnet_at_ilsvrc12_run.py
  3. .pb
  4. GPU
  5. Whatever is defined here , run_local.sh nets/resnet_at_ilsvrc12_run.py
jiaxiang-wu commented 5 years ago

Got it. We are reproducing your issue.

jiaxiang-wu commented 5 years ago

@dhingratul Sorry, we cannot reproduce your issue. According to our results (see the benchmark code for inference speed in PR #136), the model_original.pb generated by export_chn_pruned_tflite_model.py costs 3.23ms and the one generated by export_quant_tflite_model.py costs 3.34ms, which basically are the same.

Some notes:

dhingratul commented 5 years ago
  1. Yes, The only difference I see is that you are generating data with all zeros, and I generate it with np.random.rand. I use batch size and average over 1000 runs, leaving out the 1st inference(you use 100) because that time is always inflated due to GPU warmup. The difference in inference times could be because of the different GPU architecture. I am more interested in the percent speedup rather than actual numbers.
  2. The 20ms inferences were on an older gen GPU For 1080Ti, I see the model_dcp_eval/model_original.pb run at ~5ms, and model_uqtf_eval/model_original.pb run at ~8ms.
  3. The bug is reproducible with your benchmark code as well
jiaxiang-wu commented 5 years ago

@dhingratul After changing np.zeros to np.random.rand, I still cannot reproduce your issue. Following is my results: model_original.pb (chn-pruned): 3.48ms 3.41ms 3.27ms 3.26ms model_original.pb (quant) 3.47ms 3.41ms 3.38ms 3.24ms P.S.: I am using a P40 GPU.

Can you post your *.pb model files, so I can test on them?

dhingratul commented 5 years ago

@jiaxiang-wu DCP models https://drive.google.com/open?id=1NijcwZ-Cwd-Nqa73E2D5nTL_X2yhB32a UQTF Model https://drive.google.com/open?id=1LIYaJZclwBllEThoWZScj23Sq4_LkUxx

jiaxiang-wu commented 5 years ago

Thanks a lot. We are looking into this issue.

Tingelam commented 5 years ago

Got same results as @dhingratul , details: