多个onnx模型应该要怎么才能加载？

jun20061588 commented 11 months ago

尝试使用这个demo加载多个模型，总是初始化的时候报错，大概查了一下，大概是说Ort::Env是一个全局唯一的问题，请教一下，你有什么办法吗？下面是我找到相似的问题。 https://github.com/DefTruth/lite.ai.toolkit/issues/8

NagatoYuki0943 commented 11 months ago

我简单测试了一下，版本是1.16.1，只测试了cpu，因为电脑cuda升级到了12.1，但是onnxruntime还不支持，所以没测cuda，在cpu上载入成功了。下面是代码，类变量有1个Ort::Env，2个Ort::Session，2个session分别载入2个模型，共用1个Ort::Env，似乎是可以的。

main.cpp


#include "inference.hpp"
#include <opencv2/opencv.hpp>

int main() { // patchcore模型训练配置文件调整center_crop为 center_crop: null std::vector model_paths = { "D:/ml/code/anomalib/results/efficient_ad/mvtec/bottle/run/weights/openvino/model.onnx", "D:/ml/code/anomalib/results/fastflow/mvtec/bottle/run/weights/openvino/model.onnx" }; string meta_path = "D:/ml/code/anomalib/results/efficient_ad/mvtec/bottle/run/weights/openvino/metadata.json"; string image_path = "D:/ml/code/anomalib/datasets/MVTec/bottle/test/broken_large/000.png"; string save_dir = "D:/ml/code/anomalib-onnxruntime-cpp-1/result"; // 注意目录不会自动创建,要手动创建才会保存 string device = "cpu"; int threads = 4; // Ort::SessionOptions SetIntraOpNumThreads & SetInterOpNumThreads int gpu_mem_limit = 4; // onnxruntime gpu memory limit bool efficient_ad = true; // 是否使用efficient_ad模型

// 创建推理器
auto inference = Inference(model_paths, meta_path, device, threads, gpu_mem_limit, efficient_ad);

// 单张图片推理
cv::Mat image = readImage(image_path);
Result result = inference.single(image);
saveScoreAndImages(result.score, result.anomaly_map, image_path, save_dir);
cv::resize(result.anomaly_map, result.anomaly_map, { 1500, 500 });
cv::imshow("result", result.anomaly_map);
cv::waitKey(0);

return 0;

}


> inference.hpp
```cpp
#pragma once

#include <opencv2/dnn.hpp>
#include <opencv2/opencv.hpp>
#include <onnxruntime_cxx_api.h>
#include <string>
#include <numeric>
#include <vector>
#include <Windows.h>
#include "utils.h"

using namespace std;

class Inference {
private:
    bool efficient_ad;                                                      // 是否使用efficient_ad模型
    MetaData meta{};                                                     // 超参数
    Ort::Env env{};                                                          // 三个ort参数
    Ort::AllocatorWithDefaultOptions allocator{};
    Ort::RunOptions runOptions{};
    Ort::Session session1 = Ort::Session(nullptr);          // onnxruntime session
    Ort::Session session2 = Ort::Session(nullptr);          // onnxruntime session
    size_t input_nums{};                                                // 模型输入值数量
    size_t output_nums{};                                              // 模型输出值数量
    vector<const char*> input_node_names;               // 输入节点名
    vector<Ort::AllocatedStringPtr> input_node_names_ptr;   // 输入节点名指针,保存它防止释放 https://github.com/microsoft/onnxruntime/issues/13651
    vector<vector<int64_t>> input_dims;                    // 输入形状
    vector<const char*> output_node_names;             // 输出节点名
    vector<Ort::AllocatedStringPtr> output_node_names_ptr;  // 输入节点名指针
    vector<vector<int64_t>> output_dims;                  // 输出形状

public:
    Inference(std::vector<string> model_paths, string& meta_path, string& device, int threads = 0, int gpu_mem_limit = 2, bool efficient_ad = false) {
        this->efficient_ad = efficient_ad;
        // 1.读取meta
        this->meta = getJson(meta_path);
        // 2.创建模型
        this->get_model(model_paths, device, threads, gpu_mem_limit);
        // 3.获取模型的输入输出
        this->get_onnx_info();
        // 4.模型预热
        this->warm_up();
    }

    void get_model(std::vector<string> model_paths, string& device, int threads = 0, int gpu_mem_limit = 2) {
        // 获取可用的provider
        auto availableProviders = Ort::GetAvailableProviders();
        for (const auto& provider : availableProviders) {
            cout << provider << " ";
        }
        cout << endl;

        Ort::SessionOptions sessionOptions;
        // 使用0个线程执行op,若想提升速度，增加线程数
        sessionOptions.SetIntraOpNumThreads(threads);
        sessionOptions.SetInterOpNumThreads(threads);
        // ORT_ENABLE_ALL: 启用所有可能的优化
        sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
        if (device == "cuda" || device == "tensorrt") {
            // https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html
            // https://onnxruntime.ai/docs/api/c/struct_ort_c_u_d_a_provider_options.html
            OrtCUDAProviderOptions cuda_options;
            cuda_options.device_id = 0;
            cuda_options.arena_extend_strategy = 0;
            cuda_options.gpu_mem_limit = (size_t)gpu_mem_limit * 1024 * 1024 * 1024; // gpu memory limit
            cuda_options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::OrtCudnnConvAlgoSearchExhaustive;
            cuda_options.do_copy_in_default_stream = 1;
            sessionOptions.AppendExecutionProvider_CUDA(cuda_options);
            if (device == "tensorrt") {
                // https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html
                // https://onnxruntime.ai/docs/api/c/struct_ort_tensor_r_t_provider_options.html
                OrtTensorRTProviderOptions trt_options;
                trt_options.device_id = 0;
                trt_options.trt_max_workspace_size = (size_t)gpu_mem_limit * 1024 * 1024 * 1024; // gpu memory limit
                trt_options.trt_fp16_enable = 0;
                sessionOptions.AppendExecutionProvider_TensorRT(trt_options);
            }
        }
        wchar_t* model_path0 = new wchar_t[model_paths[0].size()];
        swprintf(model_path0, 4096, L"%S", model_paths[0].c_str());
        wchar_t* model_path1 = new wchar_t[model_paths[1].size()];
        swprintf(model_path1, 4096, L"%S", model_paths[1].c_str());
        // create session
        this->session1 = Ort::Session(this->env, model_path0, sessionOptions);
        this->session2 = Ort::Session(this->env, model_path1, sessionOptions);
    }

jun20061588 commented 11 months ago

好的，谢谢

jun20061588 commented 11 months ago

@NagatoYuki0943 又发现个问题，onnxruntime内存会一直增加，不知道你这边有没有这个问题，我看见好像有人也说有这个问题。https://zhuanlan.zhihu.com/p/371426504?utm_id=0

NagatoYuki0943 commented 11 months ago

我添加了一行代码，内存占用降低了不少，不过没有长期的测试，你可以试试管不管用

sessionOptions.DisableCpuMemArena();
sessionOptions.DisableMemPattern();

jun20061588 commented 11 months ago

试过了，还是有溢出的情况应该是onnxruntime库的问题，while(1)循环跑，半小时多了20G

jun20061588 commented 11 months ago

@NagatoYuki0943 补充一下测试结果，好像只有gpu会泄露，改成cpu是正常的，不确认跟gpu版本有没关系

NagatoYuki0943 commented 11 months ago

我这边cpu也是正常的，gpu没测试，请问gpu测试泄露的是显存吗？

jun20061588 commented 11 months ago

我这边cpu也是正常的，gpu没测试，请问gpu测试泄露的是显存吗？

就是内存，显存是不变的

jun20061588 commented 11 months ago

@NagatoYuki0943 hi，方便给个联系方式吗？我有些问题想请教一下你

NagatoYuki0943 commented 11 months ago

我觉得我解决不了你的问题,我主要使用的python做训练和部署,c++只会非常基本的,像是内存泄露这些完全不懂 : (

WYX523 commented 2 months ago

@jun20061588 请问GPU推理内存泄露的问题有进展吗？我这边也遇到了类似的问题

NagatoYuki0943 / anomalib-onnxruntime-cpp

多个onnx模型应该要怎么才能加载？ #6