使用GPU推理内存显存泄露

YOU-007 commented 1 year ago

环境

【FastDeploy版本】： fastdeploy-win-gpu-1.0.7 官方C++ SDK
【编译命令】/
【系统平台】: Windows x64(Windows10)
【硬件】： Nvidia GPU 1660s， CUDA 11.2 cuDNN8.4 Tensorrt8.4.1.5
【编译语言】： C++
【内存泄露】：使用CPU推理端内存可以释放，但是GPU的无法完全释放内存和显存。
【使用模型】：为官方指定：
下载PPYOLOE模型文件和测试图片 wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
【代码如下】：_

#include "fastdeploy/vision.h"
#include <iostream>
#include <chrono>
#include <thread>

#ifdef WIN32
const char sep = '\\';
#else
const char sep = '/';
#endif

void CpuInfer(const std::string& model_dir, const std::string& image_file) {
    std::cout << "Waiting for 5 seconds..." << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(5)); 
    std::cout << "Done!" << std::endl;
    std::cout << model_dir << typeid(model_dir).name() << std::endl;
    auto model_file = model_dir + sep + "model.pdmodel";
    auto params_file = model_dir + sep + "model.pdiparams";
    auto config_file = model_dir + sep + "infer_cfg.yml";
    auto option = fastdeploy::RuntimeOption();
    option.UseCpu();
    //option.UseGpu();
    option.UseOpenVINOBackend();
    //option.UseOrtBackend();
    //option.UseTrtBackend();
    std::shared_ptr<fastdeploy::vision::detection::PPYOLOE> model = std::make_shared < fastdeploy::vision::detection::PPYOLOE>(model_file, params_file,
        config_file, option);
    /* auto model = fastdeploy::vision::detection::PPYOLOE(model_file, params_file, config_file, option);*/
    if (!model->Initialized()) {
        std::cerr << "Failed to initialize." << std::endl;
        return;
    }

    auto im = cv::imread(image_file);
    fastdeploy::vision::DetectionResult res;

    for (int i = 0; i < 30; i++)
        if (!model->Predict(im, &res)) {
            std::cerr << "Failed to predict." << std::endl;
            return;
        }
    std::cout << "delete!" << std::endl;
    model.reset();
    model = nullptr;
    std::cout << "delete Done!" << std::endl;

    std::cout << "Waiting for 5 seconds..." << std::endl;
        // 在此处cpu推理内存可以释放，gpu推理不能完全释放。
    std::this_thread::sleep_for(std::chrono::seconds(5)); 
    std::cout << "Done!" << std::endl;
}

int main(int argc, char* argv[]) {
    if (argc < 4) {
        std::cout
            << "Usage: infer_demo path/to/model_dir path/to/image run_option, "
            "e.g ./infer_model ./ppyoloe_model_dir ./test.jpeg 0"
            << std::endl;
        std::cout << "The data type of run_option is int, 0: run with cpu; 1: run "
            "with gpu; 2: run with gpu and use tensorrt backend; 3: run with kunlunxin."
            << std::endl;
        return -1;
    }
    CpuInfer(argv[1], argv[2]);
    return 0;
}

jiangjiajun commented 1 year ago

GPU目前确实不支持释放。如果需要释放，当前的方式是使用子进程加载模型，进程结束后，显存会释放

SchrodingerLLX commented 1 year ago

GPU目前确实不支持释放。如果需要释放，当前的方式是使用子进程加载模型，进程结束后，显存会释放

那有计划支持释放显存吗？

PaddlePaddle / FastDeploy

使用GPU推理内存显存泄露 #2200

环境