wojtuss commented 5 years ago

This is the core of the second version of INT8 quantization, C-API based only. Passes with quantization and optimizations are to be submitted in separate PRs shortly.

The code is an updated version of https://github.com/PaddlePaddle/Paddle/pull/15834. It also contains updated version of https://github.com/PaddlePaddle/Paddle/pull/15472.

test=develop

luotao1 commented 5 years ago

Please fix the conflicts, and maybe you can create an independent PR for cpu_quantize_placement_pass and its unit-test to speed up the review and merge progress.

wojtuss commented 5 years ago

@luotao1 , the cpu_quantize_placement_pass is now submitted in a separate PR: https://github.com/PaddlePaddle/Paddle/pull/16265

I am working on updating this PR.

wojtuss commented 5 years ago

@luotao1, Quantization of our ResNet50 models works fine, but quantization of the ResNet50 model downloaded for tests fails on pattern matching for some reason. We are investigating the problem.

luotao1 commented 5 years ago

16265 is merged, and please update this PR since it contains some other commits.

wojtuss commented 5 years ago

@luotao1 , Is there anything special about the ResNet50 model downloaded by PaddlePaddle for test_analyzer_resnet50? Does it differ somehow from other ResNet50 models saved for inference (like the ones saved using python scripts from models repo)? Our quantization fails on it during pattern matching for convolution with residual data and it is difficult to find the reason for that fail.

luotao1 commented 5 years ago

There seems nothing special for ResNet50 model downloaded by test_analyzer_resnet50. And I see PR_CI is successfully in http://ci.paddlepaddle.org/viewType.html?buildTypeId=Paddle_PrCi&branch_Paddle=pull%2F15987&tab=buildTypeStatusDiv.

Our quantization fails on it during pattern matching for convolution with residual data

Could you paste the CI log for it? Is this a random error?

wojtuss commented 5 years ago

@luotao1 , We found and fixed the issue. This PR contains the fix.

The test_analyzer_resnet50 is failing now due to the accuracy threshold. What threshold should be used for quantized model?

sfraczek commented 5 years ago

We found that the model does not have Inputs ResidualData as opposed to our different model which might be a little more up to date. We made a fix for this and it works now. Now we have a diff close to 24 (when threshold is 0.001) in the test_analyzer_resnet50. Do you know if the diff is acceptable for quantized net? Do you have a suggestion how we should change the threshold for a single test? Currently it is read from gflags flag.

wojtuss commented 5 years ago

In PaddlePaddle CI builds the test_analyzer_resnet50 test fails in Analyzer_resnet50.quantization testcase on assertion we check during quantization: output of conv2d op with relu must be nonnegative. The assertion fails when running the whole test_analyzer_resnet50 test. However, when running the Analyzer_resnet50.quantization testcase alone, the assertion is satisfied and only the accuracy check (24.3812 vs. 0.001) fails the test.

bingyanghuang commented 5 years ago

We found that the model does not have Inputs ResidualData as opposed to our different model which might be a little more up to date. We made a fix for this and it works now. Now we have a diff close to 24 (when threshold is 0.001) in the test_analyzer_resnet50. Do you know if the diff is acceptable for quantized net? Do you have a suggestion how we should change the threshold for a single test? Currently it is read from gflags flag.

INT8 quantization is on the condition that we will lose some accuracy for the outputs, so it doesn't make sense that use the CompareNativeAndAnalysis to compare the single output value difference. It's normal that we get one or two "wrong" output after INT8 quantization, since we can accept 1% top1/top5 accuracy drop. So what we need do is to control the top1/top2 accuracy diff with fp32 being within 1% . I think we can add one function in tester_helper.h named (for example) CompareTopAccuracy similar as you did in your own app Stats::Gather function. What do you think about it @luotao1 ?

luotao1 commented 5 years ago

I agree with you @bingyanghuang. Besides, does the random dataset in test_analyzer_resnet50 will affect the top1/top2 accuracy? How about use the dataset and models in https://github.com/PaddlePaddle/Paddle/blob/b9fc80a13307461991bc2d091f70182b30f21128/python/paddle/fluid/contrib/tests/test_calibration.py#L126-L143 Maybe you can create a new unit-test like test_analyzer_int8_resnet50? It runs small dataset in CI default, but could supports to run full dataset for QA verify.

wojtuss commented 5 years ago

To speed up review and merge progress, since I see these is QuantizerTest in analysis_predictor_tester.cc, how about merge some of this PR at first, and test_analyzer_int8_resnet50 in another PR?

@luotao1 , so we will remove the quantization testcase from test_analyzer_resnet50 in this PR, and create a new test test_analyzer_int8_resnet50 which will compare top1/top5 accuracy on a proper dataset and submitt it as a separate PR, OK?

luotao1 commented 5 years ago

@wojtuss I think it's OK.

bingyanghuang commented 5 years ago

To speed up review and merge progress, since I see these is QuantizerTest in analysis_predictor_tester.cc, how about merge some of this PR at first, and test_analyzer_int8_resnet50 in another PR?

@luotao1 , so we will remove the quantization testcase from test_analyzer_resnet50 in this PR, and create a new test test_analyzer_int8_resnet50 which will compare top1/top5 accuracy on a proper dataset and submitt it as a separate PR, OK?

Even remove the test files, this PR is too large to let chunwei(responsible for C-API) review, could we split this PR into several parts, maybe we can talk more details in the coming meeting.

wojtuss commented 5 years ago

Even remove the test files, this PR is too large to let chunwei(responsible for C-API) review, could we split this PR into several parts, maybe we can talk more details in the coming meeting.

@bingyanghuang , there is only one file with tests now and the tests should be run in CI in my opinion

bingyanghuang commented 5 years ago

Even remove the test files, this PR is too large to let chunwei(responsible for C-API) review, could we split this PR into several parts, maybe we can talk more details in the coming meeting.

@bingyanghuang , there is only one file with tests now and the tests should be run in CI in my opinion

@wojtuss Yes, I agree with you. I think cpu_quantize_pass.cc, graph_pattern_detector.cc&.h , mkldnn_placement_pass.h, paddle_inference_api.h, paddle_pass_builder.cc can be created in the seperate PR. Besides argument.h and ir_pass_manager.cc can be merged independently.

wojtuss commented 5 years ago

@bingyanghuang , you are right. @luotao1 The first part - fix for cpu_quantize_pass - is sent as a separate PR: https://github.com/PaddlePaddle/Paddle/pull/16322 The next will come shortly.

wojtuss commented 5 years ago

@luotao1 , the next PR is sectioned off: https://github.com/PaddlePaddle/Paddle/pull/16326

wojtuss commented 5 years ago

@luotao1 , this PR is also updated now.

wojtuss commented 5 years ago

@luotao1 , The error in the PR_CI(Paddle) in building vis_demo occurs after renaming the header file paddle_quantizer_config.h into quantizer_config.h. With the name paddle_quantizer_config.h it builds fine. Is there a mechanism that differentiates header files beginning with paddle_? The comment at the beginning of the paddle_analysis_config.h suggests there is some handling of includes.

luotao1 commented 5 years ago

https://github.com/PaddlePaddle/Paddle/blob/da39a704166e6ed512fc2cf0fa757516907ea847/cmake/inference_lib.cmake#L208-L218 When we package the inference library, we only copy paddle_*.h files into it.

wojtuss commented 5 years ago

Thank you! So I have to rename it back into paddle_quantizer_config.h.

sfraczek commented 5 years ago

Hi @luotao1 I have started developing the analyzer_int8_resnet50 test and I need model of resnet that will have the top1 accuracy layer and just two files: model and params. Do you have a model to use for this?

[edit] Should I use this one? https://github.com/PaddlePaddle/Paddle/blob/b9fc80a13307461991bc2d091f70182b30f21128/python/paddle/fluid/contrib/tests/test_calibration.py#L192

luotao1 commented 5 years ago

@sfraczek You can use this one.

sfraczek commented 5 years ago

@luotao1 we will need a data reader for small ImageNet in capi now. What should we do?

luotao1 commented 5 years ago

@sfraczek You can use python data reader to preprocess the image to data.txt like other test_analyzer_xxx tests, and you could give me the data.txt, I will upload to our cdn.

luotao1 commented 5 years ago

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
model.py
"""
import argparse
import numpy as np
import time
import logging
from paddle import fluid
from continuous_evaluation import cpu_infer_time_kpi
from continuous_evaluation import gpu_infer_time_kpi
import ce_utils

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

logger = logging.getLogger(__name__)

SEED = 90
fluid.default_startup_program().random_seed = SEED

def main():
    """
    main
    """
    args = parse_args()
    #prepare()
    t = []
    infer = run_inference(args)
    t.append(infer)

    if args.device == "CPU":
        infer_time_kpi = cpu_infer_time_kpi
    else:
        infer_time_kpi = gpu_infer_time_kpi

    infer_time_kpi.add_record(np.array(t, dtype=np.float32))
    infer_time_kpi.persist()

def parse_args():
    """
    parse_args
    """
    parser = argparse.ArgumentParser("fast rcnn model ce demo")

    parser.add_argument("--batch-size", type=int, help="batch size")
    parser.add_argument("--warmup", type=int, default=10, help="warmup")
    parser.add_argument(
        "--device", type=str, default="GPU", choices=["CPU", "GPU"])

    return parser.parse_args()

def prepare():
    """
    prepare
    """
    model_filename = 'faster_rcnn2.tar'
    data_filename = 'faster_rcnn2_data.txt'
    dirname = 'upload/'
    ce_utils.maybe_download(dirname, model_filename)
    ce_utils.maybe_download(dirname, data_filename)

def run_inference(args):
    """
    run_inference
    """
    place = fluid.CPUPlace() if args.device == "CPU" else fluid.CUDAPlace(0)
    exe = fluid.Executor(place)

    params_dirname = 'faster_rcnn2/faster_rcnn2/'

    with fluid.scope_guard(fluid.core.Scope()):
        [program, feed, fetch] = fluid.io.load_inference_model(
            params_dirname,
            exe,
            model_filename='model',
            params_filename='params')

        data = np.loadtxt("faster_rcnn2/faster_rcnn2_data.txt", dtype=np.float32)
        shape = (1, 3, 600, 800)

        image = fluid.core.PaddleTensor()
        image.data = fluid.core.PaddleBuf(data.tolist())
        image.shape = shape
        image.dtype = fluid.core.PaddleDType.FLOAT32

        info = np.array([600., 800., .125], dtype=np.float32)

        image_info = fluid.core.PaddleTensor()
        image_info.data = fluid.core.PaddleBuf(info.tolist())
        image_info.shape = info.shape
        image_info.dtype = fluid.core.PaddleDType.FLOAT32

        prog_file = "{}/model".format(params_dirname)
        params_file = "{}/params".format(params_dirname)
        config = fluid.core.AnalysisConfig(prog_file, params_file)

        if args.device == "GPU":
            config.enable_use_gpu(200)
        else:
            config.disable_gpu()
        config.switch_ir_debug()

        program = fluid.compiler.CompiledProgram(program)
        program.with_inference_optimize(config)
        for i in range(args.warmup):
            exe.run(program, feed=[image, image_info])

        t1 = time.time()
        outputs = exe.run(program, feed=[image, image_info])
        t2 = time.time()
        logger.info("outputs result length[{}]".format(len(outputs)))
        return t2 - t1

if __name__ == "__main__":
    main()

@sfraczek You can use python inference api (@fc500110 ) to create your test, which is easy to do data reader.

sfraczek commented 5 years ago

What do you mean? Should I make a python test instead and make it based on the code you shared? If so, where should I put it?

luotao1 commented 5 years ago

@sfraczek

Should I make a python test instead

You can either create a C++ test or python test in your convenient.

make it based on the code you shared

The code I shared is python inference api calling C++ pass.

If so, where should I put it?

how about in python/paddle/fluid/contrib/tests/?

sfraczek commented 5 years ago

Ok. we will discuss it and try one of those :). Thank you.

lidanqing-intel commented 5 years ago

Hi, @luotao1

    data = np.loadtxt("faster_rcnn2/faster_rcnn2_data.txt", dtype=np.float32)
    shape = (1, 3, 600, 800)
Do you mean I should do as following:

In python I should use np.savetxt and save all data of val images in one file. Without any "[",“]”. the data file will be pure data without array information, just like ./build/third_party/inference_demo/rnn2/data.txt

Declare directly in c++ test shape = (100, 3, 224, 224). Because val has 100 images, after crop H and W are 224*224

Build Tensor, there will be only one batch[batch_size=100] in this case.

But what is this for?

info = np.array([600., 800., .125], dtype=np.float32)

luotao1 commented 5 years ago

In python I should use np.savetxt and save all data of val images in one file

You can use data reader like test_caliration.py do, and no need to save to a txt. Since python inference api is easy to use data reader, if you still want to savetxt, you can choose C++ API.

there will be only one batch

There are 100 batch, each batch_size=1. You could use for loop to do this.

info = np.array([600., 800., .125], dtype=np.float32)

feed=[image, image_info], there are two feeds in the example.

lidanqing-intel commented 5 years ago

Hi @luotao1 This is the the data.txt resnet50_int8v2_data.txt.tar.gz please upload to "http://paddle-inference-dist.cdn.bcebos.com/int8"

wojtuss commented 5 years ago

The core from https://github.com/PaddlePaddle/Paddle/pull/16396 was merged

PaddlePaddle / Paddle

C-API quantization core #15987

16265 is merged, and please update this PR since it contains some other commits.