Closed wojtuss closed 5 years ago
Please fix the conflicts, and maybe you can create an independent PR for cpu_quantize_placement_pass
and its unit-test to speed up the review and merge progress.
@luotao1 , the cpu_quantize_placement_pass
is now submitted in a separate PR: https://github.com/PaddlePaddle/Paddle/pull/16265
I am working on updating this PR.
@luotao1, Quantization of our ResNet50 models works fine, but quantization of the ResNet50 model downloaded for tests fails on pattern matching for some reason. We are investigating the problem.
@luotao1 ,
Is there anything special about the ResNet50 model downloaded by PaddlePaddle for test_analyzer_resnet50
? Does it differ somehow from other ResNet50 models saved for inference (like the ones saved using python scripts from models
repo)?
Our quantization fails on it during pattern matching for convolution with residual data and it is difficult to find the reason for that fail.
There seems nothing special for ResNet50 model downloaded by test_analyzer_resnet50
. And I see PR_CI is successfully in http://ci.paddlepaddle.org/viewType.html?buildTypeId=Paddle_PrCi&branch_Paddle=pull%2F15987&tab=buildTypeStatusDiv.
Our quantization fails on it during pattern matching for convolution with residual data
Could you paste the CI log for it? Is this a random error?
@luotao1 , We found and fixed the issue. This PR contains the fix.
The test_analyzer_resnet50
is failing now due to the accuracy threshold. What threshold should be used for quantized model?
We found that the model does not have Inputs ResidualData
as opposed to our different model which might be a little more up to date. We made a fix for this and it works now. Now we have a diff close to 24
(when threshold is 0.001
) in the test_analyzer_resnet50
. Do you know if the diff is acceptable for quantized net? Do you have a suggestion how we should change the threshold for a single test? Currently it is read from gflags flag.
In PaddlePaddle CI builds the test_analyzer_resnet50
test fails in Analyzer_resnet50.quantization
testcase on assertion we check during quantization: output of conv2d
op with relu
must be nonnegative. The assertion fails when running the whole test_analyzer_resnet50
test. However, when running the Analyzer_resnet50.quantization
testcase alone, the assertion is satisfied and only the accuracy check (24.3812 vs. 0.001) fails the test.
We found that the model does not have Inputs
ResidualData
as opposed to our different model which might be a little more up to date. We made a fix for this and it works now. Now we have a diff close to24
(when threshold is0.001
) in thetest_analyzer_resnet50
. Do you know if the diff is acceptable for quantized net? Do you have a suggestion how we should change the threshold for a single test? Currently it is read from gflags flag.
INT8 quantization is on the condition that we will lose some accuracy for the outputs, so it doesn't make sense that use the CompareNativeAndAnalysis
to compare the single output value difference. It's normal that we get one or two "wrong" output after INT8 quantization, since we can accept 1% top1/top5 accuracy drop. So what we need do is to control the top1/top2 accuracy diff with fp32 being within 1% . I think we can add one function in tester_helper.h
named (for example) CompareTopAccuracy
similar as you did in your own app Stats::Gather
function. What do you think about it @luotao1 ?
I agree with you @bingyanghuang.
Besides, does the random dataset in test_analyzer_resnet50
will affect the top1/top2 accuracy? How about use the dataset and models in
https://github.com/PaddlePaddle/Paddle/blob/b9fc80a13307461991bc2d091f70182b30f21128/python/paddle/fluid/contrib/tests/test_calibration.py#L126-L143
Maybe you can create a new unit-test like test_analyzer_int8_resnet50
? It runs small dataset in CI default, but could supports to run full dataset for QA verify.
To speed up review and merge progress, since I see these is
QuantizerTest
inanalysis_predictor_tester.cc
, how about merge some of this PR at first, andtest_analyzer_int8_resnet50
in another PR?
@luotao1 ,
so we will remove the quantization
testcase from test_analyzer_resnet50
in this PR, and create a new test test_analyzer_int8_resnet50
which will compare top1/top5 accuracy on a proper dataset and submitt it as a separate PR, OK?
@wojtuss I think it's OK.
To speed up review and merge progress, since I see these is
QuantizerTest
inanalysis_predictor_tester.cc
, how about merge some of this PR at first, andtest_analyzer_int8_resnet50
in another PR?@luotao1 , so we will remove the
quantization
testcase fromtest_analyzer_resnet50
in this PR, and create a new testtest_analyzer_int8_resnet50
which will compare top1/top5 accuracy on a proper dataset and submitt it as a separate PR, OK?
Even remove the test files, this PR is too large to let chunwei(responsible for C-API) review, could we split this PR into several parts, maybe we can talk more details in the coming meeting.
Even remove the test files, this PR is too large to let chunwei(responsible for C-API) review, could we split this PR into several parts, maybe we can talk more details in the coming meeting.
@bingyanghuang , there is only one file with tests now and the tests should be run in CI in my opinion
Even remove the test files, this PR is too large to let chunwei(responsible for C-API) review, could we split this PR into several parts, maybe we can talk more details in the coming meeting.
@bingyanghuang , there is only one file with tests now and the tests should be run in CI in my opinion
@wojtuss Yes, I agree with you. I think
cpu_quantize_pass.cc, graph_pattern_detector.cc&.h , mkldnn_placement_pass.h, paddle_inference_api.h, paddle_pass_builder.cc
can be created in the seperate PR. Besides argument.h and ir_pass_manager.cc
can be merged independently.
@bingyanghuang , you are right.
@luotao1
The first part - fix for cpu_quantize_pass
- is sent as a separate PR: https://github.com/PaddlePaddle/Paddle/pull/16322
The next will come shortly.
@luotao1 , the next PR is sectioned off: https://github.com/PaddlePaddle/Paddle/pull/16326
@luotao1 , this PR is also updated now.
@luotao1 ,
The error in the PR_CI(Paddle) in building vis_demo
occurs after renaming the header file paddle_quantizer_config.h
into quantizer_config.h
. With the name paddle_quantizer_config.h
it builds fine. Is there a mechanism that differentiates header files beginning with paddle_
? The comment at the beginning of the paddle_analysis_config.h
suggests there is some handling of includes.
https://github.com/PaddlePaddle/Paddle/blob/da39a704166e6ed512fc2cf0fa757516907ea847/cmake/inference_lib.cmake#L208-L218
When we package the inference library, we only copy paddle_*.h
files into it.
Thank you! So I have to rename it back into paddle_quantizer_config.h
.
Hi @luotao1 I have started developing the analyzer_int8_resnet50 test and I need model of resnet that will have the top1 accuracy layer and just two files: model
and params
. Do you have a model to use for this?
[edit] Should I use this one? https://github.com/PaddlePaddle/Paddle/blob/b9fc80a13307461991bc2d091f70182b30f21128/python/paddle/fluid/contrib/tests/test_calibration.py#L192
@sfraczek You can use this one.
@luotao1 we will need a data reader for small ImageNet in capi now. What should we do?
@sfraczek You can use python
data reader to preprocess the image to data.txt
like other test_analyzer_xxx
tests, and you could give me the data.txt
, I will upload to our cdn.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
model.py
"""
import argparse
import numpy as np
import time
import logging
from paddle import fluid
from continuous_evaluation import cpu_infer_time_kpi
from continuous_evaluation import gpu_infer_time_kpi
import ce_utils
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
SEED = 90
fluid.default_startup_program().random_seed = SEED
def main():
"""
main
"""
args = parse_args()
#prepare()
t = []
infer = run_inference(args)
t.append(infer)
if args.device == "CPU":
infer_time_kpi = cpu_infer_time_kpi
else:
infer_time_kpi = gpu_infer_time_kpi
infer_time_kpi.add_record(np.array(t, dtype=np.float32))
infer_time_kpi.persist()
def parse_args():
"""
parse_args
"""
parser = argparse.ArgumentParser("fast rcnn model ce demo")
parser.add_argument("--batch-size", type=int, help="batch size")
parser.add_argument("--warmup", type=int, default=10, help="warmup")
parser.add_argument(
"--device", type=str, default="GPU", choices=["CPU", "GPU"])
return parser.parse_args()
def prepare():
"""
prepare
"""
model_filename = 'faster_rcnn2.tar'
data_filename = 'faster_rcnn2_data.txt'
dirname = 'upload/'
ce_utils.maybe_download(dirname, model_filename)
ce_utils.maybe_download(dirname, data_filename)
def run_inference(args):
"""
run_inference
"""
place = fluid.CPUPlace() if args.device == "CPU" else fluid.CUDAPlace(0)
exe = fluid.Executor(place)
params_dirname = 'faster_rcnn2/faster_rcnn2/'
with fluid.scope_guard(fluid.core.Scope()):
[program, feed, fetch] = fluid.io.load_inference_model(
params_dirname,
exe,
model_filename='model',
params_filename='params')
data = np.loadtxt("faster_rcnn2/faster_rcnn2_data.txt", dtype=np.float32)
shape = (1, 3, 600, 800)
image = fluid.core.PaddleTensor()
image.data = fluid.core.PaddleBuf(data.tolist())
image.shape = shape
image.dtype = fluid.core.PaddleDType.FLOAT32
info = np.array([600., 800., .125], dtype=np.float32)
image_info = fluid.core.PaddleTensor()
image_info.data = fluid.core.PaddleBuf(info.tolist())
image_info.shape = info.shape
image_info.dtype = fluid.core.PaddleDType.FLOAT32
prog_file = "{}/model".format(params_dirname)
params_file = "{}/params".format(params_dirname)
config = fluid.core.AnalysisConfig(prog_file, params_file)
if args.device == "GPU":
config.enable_use_gpu(200)
else:
config.disable_gpu()
config.switch_ir_debug()
program = fluid.compiler.CompiledProgram(program)
program.with_inference_optimize(config)
for i in range(args.warmup):
exe.run(program, feed=[image, image_info])
t1 = time.time()
outputs = exe.run(program, feed=[image, image_info])
t2 = time.time()
logger.info("outputs result length[{}]".format(len(outputs)))
return t2 - t1
if __name__ == "__main__":
main()
@sfraczek You can use python inference api (@fc500110 ) to create your test, which is easy to do data reader.
What do you mean? Should I make a python test instead and make it based on the code you shared? If so, where should I put it?
@sfraczek
Should I make a python test instead
You can either create a C++ test or python test in your convenient.
make it based on the code you shared
The code I shared is python inference api calling C++ pass.
If so, where should I put it?
how about in python/paddle/fluid/contrib/tests/
?
Ok. we will discuss it and try one of those :). Thank you.
Hi, @luotao1
data = np.loadtxt("faster_rcnn2/faster_rcnn2_data.txt", dtype=np.float32) shape = (1, 3, 600, 800)
Do you mean I should do as following:
- In python I should use np.savetxt and save all data of val images in one file. Without any "[",“]”. the data file will be pure data without array information, just like
./build/third_party/inference_demo/rnn2/data.txt
- Declare directly in c++ test
shape = (100, 3, 224, 224)
. Because val has 100 images, after crop H and W are 224*224- Build Tensor, there will be only one batch[batch_size=100] in this case.
But what is this for?
info = np.array([600., 800., .125], dtype=np.float32)
In python I should use np.savetxt and save all data of val images in one file
You can use data reader
like test_caliration.py
do, and no need to save to a txt. Since python inference api is easy to use data reader
, if you still want to savetxt, you can choose C++ API.
there will be only one batch
There are 100 batch, each batch_size=1. You could use for loop
to do this.
info = np.array([600., 800., .125], dtype=np.float32)
feed=[image, image_info]
, there are two feeds in the example.
Hi @luotao1 This is the the data.txt resnet50_int8v2_data.txt.tar.gz please upload to "http://paddle-inference-dist.cdn.bcebos.com/int8"
The core from https://github.com/PaddlePaddle/Paddle/pull/16396 was merged
This is the core of the second version of INT8 quantization, C-API based only. Passes with quantization and optimizations are to be submitted in separate PRs shortly.
The code is an updated version of https://github.com/PaddlePaddle/Paddle/pull/15834. It also contains updated version of https://github.com/PaddlePaddle/Paddle/pull/15472.
test=develop