Closed wupuhuiqianren closed 4 months ago
你好,请问你有解决吗,我导出的onnx模型显示也是异常
你好,请问你有解决吗,我导出的onnx模型显示也是异常
是的,我已经解决了这个问题,可以参考我仓库的代码:https://github.com/wupuhuiqianren/IGEV-YOLO8
其中包含正确导出的onnx模型链接以及导出的代码
想了解下是什么问题导致的?
配置转换onnx的参数需要和训练时保持一致,由于作者未提供训练的log文件或yaml文件,直接转换会导致onnx模型推理结果不正确。需要通过先加载pytorch模型推理,获取参数,然后导出为onnx。另外需要注意,igev是输出为一组迭代的数据,需要将最后的输出结果提取出来,要避免提取一开始的输出结果。
谢谢,再请教个问题,目前小模型中(几十Gflops)零样本泛化能力有比igev++更强的吗,我实测下来igev++效果是最好的
谢谢,再请教个问题,目前小模型中(几十Gflops)零样本泛化能力有比igev++更强的吗,我实测下来igev++效果是最好的
目前我也是发现igev++综合速度和效果相对最好,所以项目选取了igev++。更好的等待大佬开源。
谢谢,再请教个问题,目前小模型中(几十Gflops)零样本泛化能力有比igev++更强的吗,我实测下来igev++效果是最好的
目前我也是发现igev++综合速度和效果相对最好,所以项目选取了igev++。更好的等待大佬开源。
请问一下,你使用的几十Gflops的是RT-IGEV++吗?
是的
@wupuhuiqianren 你有在MNN上跑过吗,我现在MNN上转模型出问题了,模型一致性验证testMNNFromOnnx.py报错。 https://github.com/alibaba/MNN/issues/3259
@wupuhuiqianren 在导出onnx模型时,如果output_names=["pred_disp"], 输出只写了一个,那么默认是第一个输出,即init_disp,这样得到的视差图与最终的视差图有区别。如果模型默认迭代12次的话,那么output_names应该有对应的13个值, = ["init_disp", "disp_preds1", "disp_preds2","disp_preds3", "disp_preds4", "disp_preds5","disp_preds6", "disp_preds7", "disp_preds8","disp_preds9", "disp_preds10", "disp_preds11","disp_preds12"]
@408550969 是的,需要选取最后一次迭代结果作为输出。
@408550969 你有在MNN上跑过吗,我现在MNN上转模型出问题了,模型一致性验证testMNNFromOnnx.py报错。 alibaba/MNN#3259
没有尝试过,我只尝试过转Onnx后转tensorrt进行加速。
后面解了,转MNN模型的时候,ONNX模型必须指定固定尺寸,不能是动态输入。后面的一致性问题估计是刚才说的输出层名的问题,等有空了再测下
更正一下,转ONNX模型的时候如果rt_igev_stereo.py的forward的输入参数默认是test_mode=False,假设iters为12,转ONNX的时候需要写13个输出。但其实我们只需要一个输出,即把test_mode设置为True,那么转模型的时候就只需要一个output_names了。
你好,请问你有解决吗,我导出的onnx模型显示也是异常
是的,我已经解决了这个问题,可以参考我仓库的代码:https://github.com/wupuhuiqianren/IGEV-YOLO8
其中包含正确导出的onnx模型链接以及导出的代码
为什么转换成onnx的模型这么小,同样转换要16m多。使用trtexec转化会报错,固定输入后抓换的模型输出结果会很差
你好,请问你有解决吗,我导出的onnx模型显示也是异常
是的,我已经解决了这个问题,可以参考我仓库的代码:https://github.com/wupuhuiqianren/IGEV-YOLO8 其中包含正确导出的onnx模型链接以及导出的代码
为什么转换成onnx的模型这么小,同样转换要16m多。使用trtexec转化会报错,固定输入后抓换的模型输出结果会很差
是否使用我仓库的代码进行的?精度损失应该很小才对。不需要固定输入,使用动态输入即可
你好,请问你有解决吗,我导出的onnx模型显示也是异常
是的,我已经解决了这个问题,可以参考我仓库的代码:https://github.com/wupuhuiqianren/IGEV-YOLO8 其中包含正确导出的onnx模型链接以及导出的代码
为什么转换成onnx的模型这么小,同样转换要16m多。使用trtexec转化会报错,固定输入后抓换的模型输出结果会很差
是否使用我仓库的代码进行的?精度损失应该很小才对。不需要固定输入,使用动态输入即可
按照你的迭代十二次,出来的模型相比于你的大很多哎
你好,请问你有解决吗,我导出的onnx模型显示也是异常
是的,我已经解决了这个问题,可以参考我仓库的代码:https://github.com/wupuhuiqianren/IGEV-YOLO8 其中包含正确导出的onnx模型链接以及导出的代码
为什么转换成onnx的模型这么小,同样转换要16m多。使用trtexec转化会报错,固定输入后抓换的模型输出结果会很差
是否使用我仓库的代码进行的?精度损失应该很小才对。不需要固定输入,使用动态输入即可
# -*- coding: utf-8 -*-
# @Time : 2025/3/20 15:47
# @Author : sjh
# @File : convert.py
# @Comment : Convert ONNX to TensorRT engine (FP16/FP32)
import tensorrt as trt
import os
def convert_onnx_to_trt(onnx_path, fp16=True):
"""
将 ONNX 转换为 TensorRT Engine
:param onnx_path: ONNX 文件路径
:param fp16: 是否开启 FP16(默认开启)
"""
# TensorRT 日志
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
# 创建 Builder 和 Network
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, TRT_LOGGER)
# 读取 ONNX
with open(onnx_path, "rb") as f:
if not parser.parse(f.read()):
print(f"❌ 解析 {onnx_path} 失败!错误信息如下:")
for i in range(parser.num_errors):
print(parser.get_error(i))
return
# 创建 Builder 配置
config = builder.create_builder_config()
# 是否开启 FP16
if fp16 and builder.platform_has_fast_fp16:
print("✅ 开启 FP16 模式")
config.set_flag(trt.BuilderFlag.FP16)
else:
print("⚠️ 设备不支持 FP16,使用 FP32")
# 创建动态输入 Profile
profile = builder.create_optimization_profile()
min_shape = (1, 3, 480, 640) # 最小输入
opt_shape = (1, 3, 480, 640) # 最佳输入
max_shape = (1, 3, 480, 640) # 最大输入
profile.set_shape("left", min_shape, opt_shape, max_shape)
profile.set_shape("right", min_shape, opt_shape, max_shape)
config.add_optimization_profile(profile)
# 生成 Engine
engine = builder.build_engine(network, config)
if engine is None:
print(f"❌ 生成 TensorRT engine 失败!")
return
# 保存 Engine
if fp16:
engine_path = onnx_path.replace(".onnx", "fp16.engine")
else:
engine_path = onnx_path.replace(".onnx", "fp32.engine")
with open(engine_path, "wb") as f:
f.write(engine.serialize())
print(f"✅ 成功生成 TensorRT engine: {engine_path} 🚀")
# 运行转换
onnx_models = [r"pretrained/igev_ONNX/rt_model_simplified.onnx", r"pretrained/igev_ONNX/rt_model.onnx"]
for model in onnx_models:
if os.path.exists(model):
convert_onnx_to_trt(model, fp16=True)
else:
print(f"❌ ONNX 模型 {model} 不存在,跳过转换!")
另外, 我再采用此方式转化trt模型后,输出会出现nan,但是onnx模型推理正常
你好,请问你有解决吗,我导出的onnx模型显示也是异常
是的,我已经解决了这个问题,可以参考我仓库的代码:https://github.com/wupuhuiqianren/IGEV-YOLO8 其中包含正确导出的onnx模型链接以及导出的代码
为什么转换成onnx的模型这么小,同样转换要16m多。使用trtexec转化会报错,固定输入后抓换的模型输出结果会很差
是否使用我仓库的代码进行的?精度损失应该很小才对。不需要固定输入,使用动态输入即可
# -*- coding: utf-8 -*- # @Time : 2025/3/20 15:47 # @Author : sjh # @File : convert.py # @Comment : Convert ONNX to TensorRT engine (FP16/FP32) import tensorrt as trt import os def convert_onnx_to_trt(onnx_path, fp16=True): """ 将 ONNX 转换为 TensorRT Engine :param onnx_path: ONNX 文件路径 :param fp16: 是否开启 FP16(默认开启) """ # TensorRT 日志 TRT_LOGGER = trt.Logger(trt.Logger.WARNING) # 创建 Builder 和 Network builder = trt.Builder(TRT_LOGGER) network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) parser = trt.OnnxParser(network, TRT_LOGGER) # 读取 ONNX with open(onnx_path, "rb") as f: if not parser.parse(f.read()): print(f"❌ 解析 {onnx_path} 失败!错误信息如下:") for i in range(parser.num_errors): print(parser.get_error(i)) return # 创建 Builder 配置 config = builder.create_builder_config() # 是否开启 FP16 if fp16 and builder.platform_has_fast_fp16: print("✅ 开启 FP16 模式") config.set_flag(trt.BuilderFlag.FP16) else: print("⚠️ 设备不支持 FP16,使用 FP32") # 创建动态输入 Profile profile = builder.create_optimization_profile() min_shape = (1, 3, 480, 640) # 最小输入 opt_shape = (1, 3, 480, 640) # 最佳输入 max_shape = (1, 3, 480, 640) # 最大输入 profile.set_shape("left", min_shape, opt_shape, max_shape) profile.set_shape("right", min_shape, opt_shape, max_shape) config.add_optimization_profile(profile) # 生成 Engine engine = builder.build_engine(network, config) if engine is None: print(f"❌ 生成 TensorRT engine 失败!") return # 保存 Engine if fp16: engine_path = onnx_path.replace(".onnx", "fp16.engine") else: engine_path = onnx_path.replace(".onnx", "fp32.engine") with open(engine_path, "wb") as f: f.write(engine.serialize()) print(f"✅ 成功生成 TensorRT engine: {engine_path} 🚀") # 运行转换 onnx_models = [r"pretrained/igev_ONNX/rt_model_simplified.onnx", r"pretrained/igev_ONNX/rt_model.onnx"] for model in onnx_models: if os.path.exists(model): convert_onnx_to_trt(model, fp16=True) else: print(f"❌ ONNX 模型 {model} 不存在,跳过转换!")
另外, 我再采用此方式转化trt模型后,输出会出现nan,但是onnx模型推理正常
NAN可能是精度问题导致,因为模型转为了fp16。
你好,请问你有解决吗,我导出的onnx模型显示也是异常
是的,我已经解决了这个问题,可以参考我仓库的代码:https://github.com/wupuhuiqianren/IGEV-YOLO8 其中包含正确导出的onnx模型链接以及导出的代码
为什么转换成onnx的模型这么小,同样转换要16m多。使用trtexec转化会报错,固定输入后抓换的模型输出结果会很差
是否使用我仓库的代码进行的?精度损失应该很小才对。不需要固定输入,使用动态输入即可
# -*- coding: utf-8 -*- # @Time : 2025/3/20 15:47 # @Author : sjh # @File : convert.py # @Comment : Convert ONNX to TensorRT engine (FP16/FP32) import tensorrt as trt import os def convert_onnx_to_trt(onnx_path, fp16=True): """ 将 ONNX 转换为 TensorRT Engine :param onnx_path: ONNX 文件路径 :param fp16: 是否开启 FP16(默认开启) """ # TensorRT 日志 TRT_LOGGER = trt.Logger(trt.Logger.WARNING) # 创建 Builder 和 Network builder = trt.Builder(TRT_LOGGER) network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) parser = trt.OnnxParser(network, TRT_LOGGER) # 读取 ONNX with open(onnx_path, "rb") as f: if not parser.parse(f.read()): print(f"❌ 解析 {onnx_path} 失败!错误信息如下:") for i in range(parser.num_errors): print(parser.get_error(i)) return # 创建 Builder 配置 config = builder.create_builder_config() # 是否开启 FP16 if fp16 and builder.platform_has_fast_fp16: print("✅ 开启 FP16 模式") config.set_flag(trt.BuilderFlag.FP16) else: print("⚠️ 设备不支持 FP16,使用 FP32") # 创建动态输入 Profile profile = builder.create_optimization_profile() min_shape = (1, 3, 480, 640) # 最小输入 opt_shape = (1, 3, 480, 640) # 最佳输入 max_shape = (1, 3, 480, 640) # 最大输入 profile.set_shape("left", min_shape, opt_shape, max_shape) profile.set_shape("right", min_shape, opt_shape, max_shape) config.add_optimization_profile(profile) # 生成 Engine engine = builder.build_engine(network, config) if engine is None: print(f"❌ 生成 TensorRT engine 失败!") return # 保存 Engine if fp16: engine_path = onnx_path.replace(".onnx", "fp16.engine") else: engine_path = onnx_path.replace(".onnx", "fp32.engine") with open(engine_path, "wb") as f: f.write(engine.serialize()) print(f"✅ 成功生成 TensorRT engine: {engine_path} 🚀") # 运行转换 onnx_models = [r"pretrained/igev_ONNX/rt_model_simplified.onnx", r"pretrained/igev_ONNX/rt_model.onnx"] for model in onnx_models: if os.path.exists(model): convert_onnx_to_trt(model, fp16=True) else: print(f"❌ ONNX 模型 {model} 不存在,跳过转换!")
另外, 我再采用此方式转化trt模型后,输出会出现nan,但是onnx模型推理正常
NAN可能是精度问题导致,因为模型转为了fp16。
试了fp32,依然存在这个问题
你好,请问你有解决吗,我导出的onnx模型显示也是异常
是的,我已经解决了这个问题,可以参考我仓库的代码:https://github.com/wupuhuiqianren/IGEV-YOLO8 其中包含正确导出的onnx模型链接以及导出的代码
为什么转换成onnx的模型这么小,同样转换要16m多。使用trtexec转化会报错,固定输入后抓换的模型输出结果会很差
是否使用我仓库的代码进行的?精度损失应该很小才对。不需要固定输入,使用动态输入即可
# -*- coding: utf-8 -*- # @Time : 2025/3/20 15:47 # @Author : sjh # @File : convert.py # @Comment : Convert ONNX to TensorRT engine (FP16/FP32) import tensorrt as trt import os def convert_onnx_to_trt(onnx_path, fp16=True): """ 将 ONNX 转换为 TensorRT Engine :param onnx_path: ONNX 文件路径 :param fp16: 是否开启 FP16(默认开启) """ # TensorRT 日志 TRT_LOGGER = trt.Logger(trt.Logger.WARNING) # 创建 Builder 和 Network builder = trt.Builder(TRT_LOGGER) network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) parser = trt.OnnxParser(network, TRT_LOGGER) # 读取 ONNX with open(onnx_path, "rb") as f: if not parser.parse(f.read()): print(f"❌ 解析 {onnx_path} 失败!错误信息如下:") for i in range(parser.num_errors): print(parser.get_error(i)) return # 创建 Builder 配置 config = builder.create_builder_config() # 是否开启 FP16 if fp16 and builder.platform_has_fast_fp16: print("✅ 开启 FP16 模式") config.set_flag(trt.BuilderFlag.FP16) else: print("⚠️ 设备不支持 FP16,使用 FP32") # 创建动态输入 Profile profile = builder.create_optimization_profile() min_shape = (1, 3, 480, 640) # 最小输入 opt_shape = (1, 3, 480, 640) # 最佳输入 max_shape = (1, 3, 480, 640) # 最大输入 profile.set_shape("left", min_shape, opt_shape, max_shape) profile.set_shape("right", min_shape, opt_shape, max_shape) config.add_optimization_profile(profile) # 生成 Engine engine = builder.build_engine(network, config) if engine is None: print(f"❌ 生成 TensorRT engine 失败!") return # 保存 Engine if fp16: engine_path = onnx_path.replace(".onnx", "fp16.engine") else: engine_path = onnx_path.replace(".onnx", "fp32.engine") with open(engine_path, "wb") as f: f.write(engine.serialize()) print(f"✅ 成功生成 TensorRT engine: {engine_path} 🚀") # 运行转换 onnx_models = [r"pretrained/igev_ONNX/rt_model_simplified.onnx", r"pretrained/igev_ONNX/rt_model.onnx"] for model in onnx_models: if os.path.exists(model): convert_onnx_to_trt(model, fp16=True) else: print(f"❌ ONNX 模型 {model} 不存在,跳过转换!")
另外, 我再采用此方式转化trt模型后,输出会出现nan,但是onnx模型推理正常
NAN可能是精度问题导致,因为模型转为了fp16。
试了fp32,依然存在这个问题
那可能是原来模型中的bool操作等不能无损转换的东西导致转换后推理结果有一定差异,你在转化时应该看到了一些warning
配置转换onnx的参数需要和训练时保持一致,由于作者未提供训练的log文件或yaml文件,直接转换会导致onnx模型推理结果不正确。需要通过先加载pytorch模型推理,获取参数,然后导出为onnx。另外需要注意,igev是输出为一组迭代的数据,需要将最后的输出结果提取出来,要避免提取一开始的输出结果。
@wupuhuiqianren 你好,我使用了你的转换代码,转换成功了,但是使用onnx的时候出错了,不知道如何解决,请求帮助
ONNX model exported to igevmodel.onnx 0%| | 0/1 [00:00<?, ?it/s]2025-04-09 17:54:22.519170552 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Reshape node. Name:'/Reshape_320' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:28 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) i < input_shape.NumDimensions() was false. The dimension with value zero exceeds the dimension size of the input tensor.
0%| | 0/1 [00:17<?, ?it/s]
Traceback (most recent call last):
File "/home/zjy/models/IGEV-plusplus-new/test.py", line 137, in
Unsupported operator aten::div encountered 10 time(s) Unsupported operator aten::mul encountered 82 time(s) Unsupported operator aten::sub encountered 10 time(s) Unsupported operator aten::hardtanh encountered 64 time(s) Unsupported operator aten::add encountered 20 time(s) Unsupported operator aten::leaky_relu encountered 43 time(s) Unsupported operator aten::mean encountered 48 time(s) Unsupported operator aten::sigmoid encountered 9 time(s) Unsupported operator aten::softmax encountered 2 time(s) Unsupported operator aten::sum encountered 2 time(s) Unsupported operator aten::tanh encountered 3 time(s) Unsupported operator aten::avg_pool2d encountered 1 time(s) Unsupported operator aten::linspace encountered 2 time(s) Unsupported operator aten::add encountered 14 time(s) Unsupported operator aten::rsub encountered 2 time(s) Unsupported operator aten::im2col encountered 1 time(s) 我在转onnx模型的时候报了这些warning,虽然不影响PC端的onnx的推理,但是我担心在嵌入式端跑的时候会报错,各位有碰到这个情况吗?
配置转换onnx的参数需要和训练时保持一致,由于作者未提供训练的log文件或yaml文件,直接转换会导致onnx模型推理结果不正确。需要通过先加载pytorch模型推理,获取参数,然后导出为onnx。另外需要注意,igev是输出为一组迭代的数据,需要将最后的输出结果提取出来,要避免提取一开始的输出结果。
@wupuhuiqianren 你好,我使用了你的转换代码,转换成功了,但是使用onnx的时候出错了,不知道如何解决,请求帮助
ONNX model exported to igevmodel.onnx 0%| | 0/1 [00:00<?, ?it/s]2025-04-09 17:54:22.519170552 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Reshape node. Name:'/Reshape_320' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:28 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) i < input_shape.NumDimensions() was false. The dimension with value zero exceeds the dimension size of the input tensor.
0%| | 0/1 [00:17<?, ?it/s] Traceback (most recent call last): File "/home/zjy/models/IGEV-plusplus-new/test.py", line 137, in demo(args) File "/home/zjy/models/IGEV-plusplus-new/test.py", line 89, in demo onnx_outputs = ort_session.run(None, onnx_inputs) File "/home/zjy/anaconda3/envs/foundation_stereo/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run return self._sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/Reshape_320' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:28 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) i < input_shape.NumDimensions() was false. The dimension with value zero exceeds the dimension size of the input tensor.
看起来可能是输入图像尺寸不匹配,你可以先reshape后再输入模型
你好,请问你有解决吗,我导出的onnx模型显示也是异常
是的,我已经解决了这个问题,可以参考我仓库的代码:https://github.com/wupuhuiqianren/IGEV-YOLO8 其中包含正确导出的onnx模型链接以及导出的代码
为什么转换成onnx的模型这么小,同样转换要16m多。使用trtexec转化会报错,固定输入后抓换的模型输出结果会很差
是否使用我仓库的代码进行的?精度损失应该很小才对。不需要固定输入,使用动态输入即可
# -*- coding: utf-8 -*- # @Time : 2025/3/20 15:47 # @Author : sjh # @File : convert.py # @Comment : Convert ONNX to TensorRT engine (FP16/FP32) import tensorrt as trt import os def convert_onnx_to_trt(onnx_path, fp16=True): """ 将 ONNX 转换为 TensorRT Engine :param onnx_path: ONNX 文件路径 :param fp16: 是否开启 FP16(默认开启) """ # TensorRT 日志 TRT_LOGGER = trt.Logger(trt.Logger.WARNING) # 创建 Builder 和 Network builder = trt.Builder(TRT_LOGGER) network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) parser = trt.OnnxParser(network, TRT_LOGGER) # 读取 ONNX with open(onnx_path, "rb") as f: if not parser.parse(f.read()): print(f"❌ 解析 {onnx_path} 失败!错误信息如下:") for i in range(parser.num_errors): print(parser.get_error(i)) return # 创建 Builder 配置 config = builder.create_builder_config() # 是否开启 FP16 if fp16 and builder.platform_has_fast_fp16: print("✅ 开启 FP16 模式") config.set_flag(trt.BuilderFlag.FP16) else: print("⚠️ 设备不支持 FP16,使用 FP32") # 创建动态输入 Profile profile = builder.create_optimization_profile() min_shape = (1, 3, 480, 640) # 最小输入 opt_shape = (1, 3, 480, 640) # 最佳输入 max_shape = (1, 3, 480, 640) # 最大输入 profile.set_shape("left", min_shape, opt_shape, max_shape) profile.set_shape("right", min_shape, opt_shape, max_shape) config.add_optimization_profile(profile) # 生成 Engine engine = builder.build_engine(network, config) if engine is None: print(f"❌ 生成 TensorRT engine 失败!") return # 保存 Engine if fp16: engine_path = onnx_path.replace(".onnx", "fp16.engine") else: engine_path = onnx_path.replace(".onnx", "fp32.engine") with open(engine_path, "wb") as f: f.write(engine.serialize()) print(f"✅ 成功生成 TensorRT engine: {engine_path} 🚀") # 运行转换 onnx_models = [r"pretrained/igev_ONNX/rt_model_simplified.onnx", r"pretrained/igev_ONNX/rt_model.onnx"] for model in onnx_models: if os.path.exists(model): convert_onnx_to_trt(model, fp16=True) else: print(f"❌ ONNX 模型 {model} 不存在,跳过转换!")
另外, 我再采用此方式转化trt模型后,输出会出现nan,但是onnx模型推理正常
NAN可能是精度问题导致,因为模型转为了fp16。
试了fp32,依然存在这个问题
兄弟,你解决了engine模型输出是nan的问题吗?我也遇到了
`import torch import argparse from IGEVplusplus.core.igev_stereo import IGEVStereo import os
def main(): parser = argparse.ArgumentParser() parser.add_argument('--restore_ckpt', help="restore checkpoint", default='C:/Users/Tianle Zhu/PycharmProjects/openStereo/OpenStereo/IGEVplusplus/pretrained_models/igev_plusplus/sceneflow.pth') parser.add_argument('--output_onnx', help="path to save the ONNX model", default='igevmodel.onnx') parser.add_argument('--hidden_dims', nargs='+', type=int, default=[128]*3, help="hidden state and context dimensions") parser.add_argument('--corr_levels', type=int, default=2, help="number of levels in the correlation pyramid") parser.add_argument('--corr_radius', type=int, default=4, help="width of the correlation pyramid") parser.add_argument('--n_downsample', type=int, default=2, help="resolution of the disparity field (1/2^K)") parser.add_argument('--n_gru_layers', type=int, default=3, help="number of hidden GRU levels") parser.add_argument('--max_disp', type=int, default=768, help="max disp range") parser.add_argument('--s_disp_range', type=int, default=48, help="max disp of small disparity-range geometry encoding volume") parser.add_argument('--m_disp_range', type=int, default=96, help="max disp of medium disparity-range geometry encoding volume") parser.add_argument('--l_disp_range', type=int, default=192, help="max disp of large disparity-range geometry encoding volume") parser.add_argument('--s_disp_interval', type=int, default=1, help="disp interval of small disparity-range geometry encoding volume") parser.add_argument('--m_disp_interval', type=int, default=2, help="disp interval of medium disparity-range geometry encoding volume") parser.add_argument('--l_disp_interval', type=int, default=4, help="disp interval of large disparity-range geometry encoding volume") parser.add_argument('--mixed_precision', action='store_true', default=True, help='use mixed precision') parser.add_argument('--precision_dtype', default='float32', choices=['float16', 'bfloat16', 'float32'], help='Choose precision type: float16 or bfloat16 or float32') args = parser.parse_args()
if name == 'main': main() `
这是运行sceneflow.pth的输出形状: PyTorch output: ([tensor([[[[29.0013, 53.8778, 88.0860, ..., 34.7888, 11.8366, 50.3145], [93.5353, 26.1107, 14.3722, ..., 30.4937, 33.2719, 21.5971], [15.9458, 5.6928, 2.7789, ..., 37.8875, 84.5741, 39.2494], ..., [28.9441, 86.2434, 16.7939, ..., 18.7385, 36.9388, 41.9996], [92.8658, 43.1917, 78.7108, ..., 26.8326, 34.6257, 34.1477], [19.8577, 12.8393, 76.4459, ..., 39.1154, 40.0313, 37.2507]]]], device='cuda:0'), tensor([[[[ 58.5452, 108.9391, 177.1231, ..., 69.1304, 23.6351, 100.3411], [187.6416, 52.2681, 28.8216, ..., 60.7284, 66.3563, 42.8525], [ 32.2707, 11.4744, 5.5737, ..., 75.3983, 167.7048, 77.9400], ..., [ 57.8604, 172.0259, 33.5703, ..., 37.4867, 73.7555, 84.0271], [186.0964, 86.5474, 157.5152, ..., 53.6458, 69.2420, 68.2464], [ 39.7308, 25.6452, 152.5088, ..., 78.2067, 80.0006, 74.4799]]]], device='cuda:0'), tensor([[[[117.0596, 217.5538, 353.6552, ..., 138.9668, 47.1501, 200.5637], [374.3046, 104.2425, 57.5167, ..., 121.4761, 132.2172, 86.3100], [ 64.5503, 22.9022, 11.1203, ..., 150.6640, 338.2207, 156.8421], ..., [115.8386, 345.5372, 67.2338, ..., 75.0406, 147.7695, 168.2122], [371.5367, 172.7967, 314.8551, ..., 107.4330, 138.7066, 136.7549], [ 79.4673, 51.3962, 306.2659, ..., 156.5522, 160.2482, 149.1412]]]], device='cuda:0')], [tensor([[[[28.5714, 59.2655, 7.7336, ..., 25.6076, 58.0579, 42.3363], [79.1034, 0.5901, 88.8518, ..., 29.9243, 73.4869, 80.0929], [14.9802, 13.9610, 0.8517, ..., 64.2570, 47.1286, 26.6445], ..., [ 4.2397, 34.0624, 54.3831, ..., 31.9793, 44.7069, 41.7466], [65.6568, 4.0036, 31.4481, ..., 44.3313, 34.3015, 39.3516], [ 6.4482, 9.1785, 74.4757, ..., 33.8787, 46.2290, 41.1279]]]], device='cuda:0'), tensor([[[[28.0520, 50.7831, 8.4401, ..., 26.2423, 59.3206, 42.2179], [79.3957, 0.3566, 89.5371, ..., 25.3880, 65.4024, 82.3731], [13.1218, 13.1390, 1.1641, ..., 71.0575, 47.9253, 26.0326], ..., [ 3.6473, 40.8485, 50.5134, ..., 32.6487, 47.2241, 42.8362], [69.3885, 3.8374, 35.0090, ..., 45.2010, 34.9256, 39.4319], [ 6.5557, 9.5285, 75.4608, ..., 32.9300, 46.7303, 41.6241]]]], device='cuda:0'), tensor([[[[25.1312, 47.7061, 9.7663, ..., 29.7251, 52.2108, 44.0166], [82.0715, 0.3262, 90.0958, ..., 17.2943, 61.4543, 83.3319], [13.4949, 14.8881, 1.2438, ..., 68.8581, 55.3831, 25.1521], ..., [ 4.3303, 43.0788, 44.0900, ..., 32.6155, 49.3730, 44.5856], [72.9894, 3.8228, 35.7773, ..., 46.0656, 37.4763, 40.0746], [ 6.8989, 10.1544, 75.0096, ..., 33.8983, 48.3354, 42.8739]]]], device='cuda:0'), tensor([[[[24.3196, 45.5057, 11.0031, ..., 29.7735, 50.2831, 44.0622], [82.6168, 0.2742, 90.6156, ..., 15.5108, 59.0894, 83.4923], [13.1129, 13.1179, 1.1395, ..., 65.1546, 58.0728, 24.8702], ..., [ 4.6418, 45.7862, 44.4701, ..., 32.4407, 50.7449, 46.0573], [74.2869, 3.3784, 34.5817, ..., 47.2433, 40.0150, 40.7858], [ 7.0966, 10.5310, 74.8988, ..., 35.0906, 49.4289, 43.8108]]]], device='cuda:0'), tensor([[[[23.8612, 45.9092, 11.4410, ..., 28.7050, 50.7247, 43.4772], [83.3476, 0.2439, 91.3199, ..., 14.7396, 57.5958, 83.5384], [12.9705, 11.4003, 1.1079, ..., 62.4942, 58.4046, 25.5686], ..., [ 4.7631, 47.7861, 46.5023, ..., 32.0948, 51.1718, 46.9326], [74.9342, 2.9439, 33.4831, ..., 47.9273, 41.5787, 41.8455], [ 7.1835, 10.5206, 75.3968, ..., 35.9609, 50.1965, 44.4195]]]], device='cuda:0'), tensor([[[[23.4306, 46.0735, 11.9116, ..., 27.8151, 51.4464, 43.0496], [84.1550, 0.2285, 92.1363, ..., 14.1494, 56.2557, 83.5624], [12.8120, 10.1607, 1.0839, ..., 61.2645, 58.6994, 26.4048], ..., [ 4.8999, 48.8665, 47.7525, ..., 31.8229, 51.1015, 47.6135], [75.6703, 2.6232, 32.7390, ..., 47.9980, 42.1612, 42.4763], [ 7.3000, 10.4434, 75.7703, ..., 36.4336, 50.8029, 44.8092]]]], device='cuda:0'), tensor([[[[23.0504, 45.1449, 12.3417, ..., 27.6902, 52.2003, 42.9079], [84.8105, 0.2259, 92.8770, ..., 13.6868, 54.8071, 83.9575], [12.6348, 9.6006, 1.0629, ..., 61.3216, 59.0371, 27.0020], ..., [ 4.9828, 49.3707, 47.6814, ..., 31.8639, 50.7919, 48.2755], [76.7096, 2.5162, 32.2804, ..., 48.3676, 42.4173, 42.9549], [ 7.4033, 10.5009, 76.2909, ..., 36.9848, 51.0348, 45.2061]]]], device='cuda:0'), tensor([[[[22.8114, 43.4099, 12.9659, ..., 27.5600, 53.3396, 42.7280], [85.3670, 0.2275, 93.4980, ..., 13.5598, 53.7164, 84.4763], [12.5149, 9.2353, 1.0458, ..., 61.7009, 59.3788, 27.5518], ..., [ 5.0274, 49.4057, 48.1273, ..., 31.9669, 50.5099, 48.9995], [77.4885, 2.5081, 31.6707, ..., 48.7838, 42.6161, 43.2452], [ 7.4615, 10.4037, 77.1363, ..., 37.5388, 51.2559, 45.6977]]]], device='cuda:0'), tensor([[[[22.7807, 41.5977, 13.7099, ..., 27.3374, 54.7221, 42.5806], [85.8323, 0.2314, 94.1311, ..., 13.6084, 52.8527, 85.1395], [12.4513, 9.0218, 1.0209, ..., 62.2184, 59.6755, 28.1719], ..., [ 5.0783, 49.1181, 48.6094, ..., 31.9802, 50.1858, 49.6395], [78.1172, 2.5292, 30.9164, ..., 49.1689, 42.6790, 43.2558], [ 7.5247, 10.2613, 78.1282, ..., 38.1327, 51.3870, 46.2625]]]], device='cuda:0'), tensor([[[[22.7740, 40.3702, 14.4604, ..., 27.1769, 55.7119, 42.4141], [86.3082, 0.2384, 94.7900, ..., 13.7379, 52.2594, 85.7656], [12.5111, 9.1628, 0.9716, ..., 62.6971, 60.1654, 28.8157], ..., [ 5.2178, 49.2052, 48.7646, ..., 32.0434, 50.1120, 50.2681], [78.7346, 2.5641, 30.3717, ..., 49.4548, 42.9968, 43.1645], [ 7.6478, 10.1476, 79.0954, ..., 38.6968, 51.7091, 46.8955]]]], device='cuda:0'), tensor([[[[22.7809, 39.3748, 15.1651, ..., 27.1988, 56.5072, 42.3562], [86.8223, 0.2473, 95.4616, ..., 13.9877, 51.7582, 86.3995], [12.5789, 9.2471, 0.9288, ..., 63.4102, 60.7180, 29.4135], ..., [ 5.3313, 49.3565, 48.9921, ..., 32.1126, 50.1696, 50.8645], [79.4208, 2.6044, 30.0517, ..., 49.6727, 43.3571, 43.0553], [ 7.7630, 10.0891, 80.0518, ..., 39.2622, 52.0984, 47.5035]]]], device='cuda:0'), tensor([[[[22.8525, 38.6675, 15.7282, ..., 27.1976, 57.1655, 42.3303], [87.3169, 0.2576, 96.1150, ..., 14.2513, 51.4638, 86.9904], [12.6740, 9.2790, 0.8908, ..., 63.9113, 61.2733, 29.9492], ..., [ 5.4260, 49.5896, 49.2360, ..., 32.2264, 50.3385, 51.3944], [80.1403, 2.6471, 29.7716, ..., 49.9107, 43.7567, 42.8972], [ 7.8716, 10.0397, 81.0669, ..., 39.8572, 52.4858, 48.0684]]]], device='cuda:0')]) 这是ONNX运行的输出形状: ONNX output shape: (1, 1, 512, 768)
我现在无法正确匹配输出。