Possible to support dynamic batch size?

luvwinnie commented 3 years ago

Hi, I'm trying to use yolov5 both in primary and secondary detector, currently it seems like the engine is built with a fixed batch size, does this possible to generate dynamic batch size so that can be configure in deepstream?

It shows the following implicit info even I built the engine with BATCH_SIZE 4

deepstream_app_1  | Opening in BLOCKING MODE
deepstream_app_1  | INFO: [Implicit Engine Info]: layers num: 2
deepstream_app_1  | 0   INPUT  kFLOAT data            3x640x640
deepstream_app_1  | 1   OUTPUT kFLOAT prob            6001x1x1

Endeavor-Gcl commented 3 years ago

嗨，我正在尝试在主要和次要检测器中使用yolov5，目前看来该引擎是按固定的批次大小构建的，这是否可以生成动态的批次大小，以便可以在深流中进行配置？

即使我使用BATCH_SIZE 4构建引擎，它也会显示以下隐式信息
deepstream_app_1  | Opening in BLOCKING MODE
deepstream_app_1  | INFO: [Implicit Engine Info]: layers num: 2
deepstream_app_1  | 0   INPUT  kFLOAT data            3x640x640
deepstream_app_1  | 1   OUTPUT kFLOAT prob            6001x1x1

hello,have you solved it?

luvwinnie commented 3 years ago

No, I didn't solved it yet. Do you can have idea to solved it?

Endeavor-Gcl commented 3 years ago

No, I didn't solved it yet. Do you can have idea to solved it?

sorry，i have no idea.

luvwinnie commented 3 years ago

@DanaHan are you able to make the engine with dynamic batch size?

luvwinnie commented 3 years ago

I'm trying to create an yolov5 explicitBatch engine. This is my currently work, however I need some help on the network. common.hpp

...
ILayer* focus(INetworkDefinition *network, std::map<std::string, Weights>& weightMap, ITensor& input, int inch, int outch, int ksize, std::string lname,int batch_size) {
    ISliceLayer *s1 = network->addSlice(input, Dims4{batch_size,0, 0, 0}, Dims4{batch_size,inch, Yolo::INPUT_H / 2, Yolo::INPUT_W / 2}, Dims4{batch_size,1, 2, 2});
    ISliceLayer *s2 = network->addSlice(input, Dims4{batch_size,0, 1, 0}, Dims4{batch_size,inch, Yolo::INPUT_H / 2, Yolo::INPUT_W / 2}, Dims4{batch_size,1, 2, 2});
    ISliceLayer *s3 = network->addSlice(input, Dims4{batch_size,0, 0, 1}, Dims4{batch_size,inch, Yolo::INPUT_H / 2, Yolo::INPUT_W / 2}, Dims4{batch_size,1, 2, 2});
    ISliceLayer *s4 = network->addSlice(input, Dims4{batch_size,0, 1, 1}, Dims4{batch_size,inch, Yolo::INPUT_H / 2, Yolo::INPUT_W / 2}, Dims4{batch_size,1, 2, 2});
    ITensor* inputTensors[] = {s1->getOutput(0), s2->getOutput(0), s3->getOutput(0), s4->getOutput(0)};
    auto cat = network->addConcatenation(inputTensors, 4);
    auto conv = convBlock(network, weightMap, *cat->getOutput(0), outch, ksize, 1, 1, lname + ".conv");
    return conv;
}
...

yolov5.cpp

ICudaEngine* createEngine_s(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt) {
    const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
    INetworkDefinition* network = builder->createNetworkV2(explicitBatch);
    // std::cout << "Explicit BATCH" << std::endl;
    // INetworkDefinition* network = builder->createNetworkV2(0U);

    // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims4{BATCH_SIZE,3, INPUT_H, INPUT_W});
    assert(data);

    std::map<std::string, Weights> weightMap = loadWeights("../yolov5s.wts");
    std::cout << "BATCH_SIZE:" << BATCH_SIZE << ",INPUT_H:" << INPUT_H << ",INPUT_W" << INPUT_W  << std::endl;
    Weights emptywts{DataType::kFLOAT, nullptr, 0};

    // yolov5 backbone
    auto focus0 = focus(network, weightMap, *data, 3, 32, 3, "model.0",BATCH_SIZE);

    std::cout << "focus0:" << "passed" <<std::endl;
    auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), 64, 3, 2, 1, "model.1");
    std::cout << "conv1:" << "passed" <<std::endl;
    auto bottleneck_CSP2 = bottleneckCSP(network, weightMap, *conv1->getOutput(0), 64, 64, 1, true, 1, 0.5, "model.2");
    std::cout << "bottleneck_CSP2:" << "passed" <<std::endl;
    auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), 128, 3, 2, 1, "model.3");
    std::cout << "conv3:" << "passed" <<std::endl;
    auto bottleneck_csp4 = bottleneckCSP(network, weightMap, *conv3->getOutput(0), 128, 128, 3, true, 1, 0.5, "model.4");
    std::cout << "bottleneck_csp4:" << "passed" <<std::endl;
    auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), 256, 3, 2, 1, "model.5");
    std::cout << "conv5:" << "passed" <<std::endl;
    auto bottleneck_csp6 = bottleneckCSP(network, weightMap, *conv5->getOutput(0), 256, 256, 3, true, 1, 0.5, "model.6");
    std::cout << "bottleneck_csp6:" << "passed" <<std::endl;
    auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), 512, 3, 2, 1, "model.7");
    std::cout << "conv7:" << "passed" <<std::endl;
    auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), 512, 512, 5, 9, 13, "model.8");
    std::cout << "spp8:" << "passed" <<std::endl;

    // yolov5 head
    auto bottleneck_csp9 = bottleneckCSP(network, weightMap, *spp8->getOutput(0), 512, 512, 1, false, 1, 0.5, "model.9");
    std::cout << "spp8:" << "passed" <<std::endl;
    auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), 256, 1, 1, 1, "model.10");
    std::cout << "conv10:" << "passed" <<std::endl;

    float *deval = reinterpret_cast<float*>(malloc(sizeof(float) * 256 * 2 * 2));
    for (int i = 0; i < 256 * 2 * 2; i++) {
        deval[i] = 1.0;
    }
    std::cout << "deval:" << "passed" <<std::endl;
    Weights deconvwts11{DataType::kFLOAT, deval, 256 * 2 * 2};
    std::cout << "deconvwts11:" << "passed" <<std::endl;
    IDeconvolutionLayer* deconv11 = network->addDeconvolutionNd(*conv10->getOutput(0), 256, DimsHW{2, 2}, deconvwts11, emptywts);
    deconv11->setStrideNd(DimsHW{2, 2});
    deconv11->setNbGroups(256);
    weightMap["deconv11"] = deconvwts11;
    std::cout << "deconv11:" << "passed" <<std::endl;

    ITensor* inputTensors12[] = {deconv11->getOutput(0), bottleneck_csp6->getOutput(0)};
    std::cout << "inputTensors12:" << "passed" <<std::endl;
    auto cat12 = network->addConcatenation(inputTensors12, 2);
    std::cout << "cat12:" << "passed" <<std::endl;
    auto bottleneck_csp13 = bottleneckCSP(network, weightMap, *cat12->getOutput(0), 512, 256, 1, false, 1, 0.5, "model.13");
    std::cout << "bottleneck_csp13:" << "passed" <<std::endl;
    auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), 128, 1, 1, 1, "model.14");
    std::cout << "conv14:" << "passed" <<std::endl;

    Weights deconvwts15{DataType::kFLOAT, deval, 128 * 2 * 2};
    IDeconvolutionLayer* deconv15 = network->addDeconvolutionNd(*conv14->getOutput(0), 128, DimsHW{2, 2}, deconvwts15, emptywts);
    std::cout << "deconv15:" << "passed" <<std::endl;
    deconv15->setStrideNd(DimsHW{2, 2});
    deconv15->setNbGroups(128);
    //weightMap["deconv15"] = deconvwts15;

    ITensor* inputTensors16[] = {deconv15->getOutput(0), bottleneck_csp4->getOutput(0)};
    std::cout << "inputTensors16:" << "passed" <<std::endl;
    auto cat16 = network->addConcatenation(inputTensors16, 2);
    std::cout << "cat16:" << "passed" <<std::endl;
    auto bottleneck_csp17 = bottleneckCSP(network, weightMap, *cat16->getOutput(0), 256, 128, 1, false, 1, 0.5, "model.17");
    std::cout << "bottleneck_csp17:" << "passed" <<std::endl;
    IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{1, 1}, weightMap["model.24.m.0.weight"], weightMap["model.24.m.0.bias"]);
    std::cout << "det0:" << "passed" <<std::endl;

    auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), 128, 3, 2, 1, "model.18");
    std::cout << "conv18:" << "passed" <<std::endl;
    ITensor* inputTensors19[] = {conv18->getOutput(0), conv14->getOutput(0)};
    std::cout << "inputTensors19:" << "passed" <<std::endl;
    auto cat19 = network->addConcatenation(inputTensors19, 2);
    std::cout << "cat19:" << "passed" <<std::endl;
    auto bottleneck_csp20 = bottleneckCSP(network, weightMap, *cat19->getOutput(0), 256, 256, 1, false, 1, 0.5, "model.20");
    std::cout << "bottleneck_csp20:" << "passed" <<std::endl;
    IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{1, 1}, weightMap["model.24.m.1.weight"], weightMap["model.24.m.1.bias"]);

    auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), 256, 3, 2, 1, "model.21");
    std::cout << "conv21:" << "passed" <<std::endl;
    ITensor* inputTensors22[] = {conv21->getOutput(0), conv10->getOutput(0)};
    std::cout << "inputTensors22:" << "passed" <<std::endl;
    auto cat22 = network->addConcatenation(inputTensors22, 2);
    std::cout << "cat22:" << "passed" <<std::endl;
    auto bottleneck_csp23 = bottleneckCSP(network, weightMap, *cat22->getOutput(0), 512, 512, 1, false, 1, 0.5, "model.23");
    std::cout << "bottleneck_csp23:" << "passed" <<std::endl;
    IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{1, 1}, weightMap["model.24.m.2.weight"], weightMap["model.24.m.2.bias"]);
    std::cout << "det2:" << "passed" <<std::endl;

    auto creator = getPluginRegistry()->getPluginCreator("YoloLayer_TRT", "1");
    const PluginFieldCollection* pluginData = creator->getFieldNames();
    IPluginV2 *pluginObj = creator->createPlugin("yololayer", pluginData);
    ITensor* inputTensors_yolo[] = {det2->getOutput(0), det1->getOutput(0), det0->getOutput(0)};
    auto yolo = network->addPluginV2(inputTensors_yolo, 3, *pluginObj);

    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));

    // Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB
#ifdef USE_FP16
    config->setFlag(BuilderFlag::kFP16);
#endif
#ifdef USE_DLA
    std::cout << "Set use DLA instead of GPU" << std::endl;
    config->setFlag(BuilderFlag::kGPU_FALLBACK);
    config->setDefaultDeviceType(DeviceType::kDLA);
    config->setDLACore(2);
    // builder->setDefaultDeviceType(DeviceType::kDLA);
#endif
    std::cout << "Building engine, please wait for a while..." << std::endl;
    ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
    std::cout << "Build engine successfully!" << std::endl;

    // Don't need the network any more
    network->destroy();

    // Release host memory
    for (auto& mem : weightMap)
    {
        free((void*) (mem.second.values));
    }

    return engine;
}

Currently I changed these file and the error shows this.

Loading weights: ../yolov5s.wts
BATCH_SIZE:4,INPUT_H:640,INPUT_W640
focus0:passed
conv1:passed
bottleneck_CSP2:passed
conv3:passed
bottleneck_csp4:passed
conv5:passed
bottleneck_csp6:passed
conv7:passed
spp8:passed
spp8:passed
conv10:passed
deval:passed
deconvwts11:passed
deconv11:passed
inputTensors12:passed
cat12:passed
bottleneck_csp13:passed
conv14:passed
deconv15:passed
inputTensors16:passed
cat16:passed
bottleneck_csp17:passed
det0:passed
conv18:passed
inputTensors19:passed
cat19:passed
bottleneck_csp20:passed
conv21:passed
inputTensors22:passed
cat22:passed
bottleneck_csp23:passed
det2:passed
Building engine, please wait for a while...
[04/02/2021-10:46:14] [E] [TRT] (Unnamed Layer* 0) [Slice]: out of bounds slice, input dimensions = [4,3,640,640], start = [4,0,0,0], size = [4,3,320,320], stride = [4,1,2,2].
[04/02/2021-10:46:14] [E] [TRT] Layer (Unnamed Layer* 0) [Slice] failed validation
[04/02/2021-10:46:14] [E] [TRT] Network validation failed.
Build engine successfully!
yolov5: /home/administrator/deepstream_docker/deepstream_app/deepstream_yolov5/yolov5-tensorrt/yolov5.cpp:505: void APIToModel(unsigned int, nvinfer1::IHostMemory**): Assertion `engine != nullptr' failed.
Aborted (core dumped)

DanaHan / Yolov5-in-Deepstream-5.0

Possible to support dynamic batch size? #23