NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.52k stars 2.1k forks source link

How to use calibrated cache file to generate int8 engine in c++ #3881

Open ashray21 opened 3 months ago

ashray21 commented 3 months ago

I have successfully generated calibrated dataset.cache file of my dataset using polygraphy. I want to load the generated calibrated cache file and create int8 engine using c++.

Function I'm using to convert onnx model into tensorrt engine file

std::unique_ptr<nvinfer1::ICudaEngine> createCudaEngine(const std::string &onnxFileName, nvinfer1::ILogger &logger, int batchSize, ENGINE_TYPE& type=FP32)
 {
    std::unique_ptr<IBuilder> builder = createInferBuilder(logger);
    std::unique_ptr<INetworkDefinition> network = builder->createNetworkV2(1U << (unsigned) NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);

    std::unique_ptr<nvonnxparser::IParser> parser = nvonnxparser::createParser(*network, logger);

    if (!parser->parseFromFile(onnxFileName.c_str(), static_cast<int>(ILogger::Severity::kINFO)))
    throw std::runtime_error("ERROR: could not parse ONNX model " + onnxFileName + " !");
    std::unique_ptr<IOptimizationProfile> profile = builder->createOptimizationProfile();
    profile->setDimensions("input", OptProfileSelector::kMIN, Dims2{batchSize, 3});
    profile->setDimensions("input", OptProfileSelector::kMAX, Dims2{batchSize, 3});
    profile->setDimensions("input", OptProfileSelector::kOPT, Dims2{batchSize, 3});

    IBuilderConfig* config = builder->createBuilderConfig();

    config->setMaxWorkspaceSize(64*1024*1024);
    config->addOptimizationProfile(profile);

    if (INT8) 
    {   
       // how to load and use calibrated cache file here 
        config->setFlag(BuilderFlag::kINT8);
    }
    return builder->buildEngineWithConfig(*network, *config);
 }

@zerollzeng Please help

brb-nv commented 3 months ago

Hi, I think you have a few options: