Closed YFforever2022 closed 2 years ago
Do you know why it takes only 9 milliseconds to infer using Pt model, but 20 milliseconds to infer using TRT model? They have already warmed up 10 times. If so, tensorrt does not seem to accelerate. Maybe there is a configuration error
which model?
Do you know why it takes only 9 milliseconds to infer using Pt model, but 20 milliseconds to infer using TRT model? They have already warmed up 10 times. If so, tensorrt does not seem to accelerate. Maybe there is a configuration error
your device ?
The same model was obtained using the official yolov7 tiny training
GTX 1080
GTX 1080
python export.py -o xxx.onnx -e xxx.trt -p p32
try FP32 precsion
This is the reasoning speed of the fp32 model. It takes 19 milliseconds. The command: Python export py -o best. onnx -e best. trt -p fp32 --end2end
This is the reasoning speed of the fp32 model. It takes 19 milliseconds. The command: Python export py -o best. onnx -e best. trt -p fp32 --end2end
maybe you should delect the image save section,
I think the image save is slowly.
This is the reasoning speed of the fp32 model. It takes 19 milliseconds. The command: Python export py -o best. onnx -e best. trt -p fp32 --end2end
you can provide more details of your test script.
your trt version?
你的trt版本?
TensorRT-8.4.1.5
show me the pytorch code?
show me the pytorch code?
show me the pytorch code?
Could you do this exp in the colab env? [use the T4 ]
Maybe the 1080 is too old.
I'll try
I'll try
thanks, expect your report!
我测了确实快很多,就是置信度结果不对, 您的对么?
我测了确实快很多,就是置信度结果不对, 您的对么?
如果您使用pred.get_fps() 获取FPS会得到180-200左右,相当于5ms左右耗时,但这并不是整个识别的流程耗时
我测了确实快很多,就是置信度结果不对, 您的对么?
如果您使用pred.get_fps() 获取FPS会得到180-200左右,相当于5ms左右耗时,但这并不是整个识别的流程耗时
是的, 目前大部分汇报FPS 都是指推理时间耗时
我测了确实快很多,就是置信度结果不对, 您的对么?
如果您使用pred.get_fps() 获取FPS会得到180-200左右,相当于5ms左右耗时,但这并不是整个识别的流程耗时
是的, 目前大部分汇报FPS 都是指推理时间耗时 是的 不过我自己统计的是,将图片传入推理的那一刻开始计时,直到返回推理结果,期间的耗时。 无论是pt模型还是trt模型,结果都是正确的,只是trt模型的这个流程耗时较pt模型久一些
我测了确实快很多,就是置信度结果不对, 您的对么?
如果您使用pred.get_fps() 获取FPS会得到180-200左右,相当于5ms左右耗时,但这并不是整个识别的流程耗时
是的, 目前大部分汇报FPS 都是指推理时间耗时 是的 不过我自己统计的是,将图片传入推理的那一刻开始计时,直到返回推理结果,期间的耗时。 无论是pt模型还是trt模型,结果都是正确的,只是trt模型的这个流程耗时较pt模型久一些
T4上也是如此吗?
我测了确实快很多,就是置信度结果不对, 您的对么?
如果您使用pred.get_fps() 获取FPS会得到180-200左右,相当于5ms左右耗时,但这并不是整个识别的流程耗时
是的, 目前大部分汇报FPS 都是指推理时间耗时 是的 不过我自己统计的是,将图片传入推理的那一刻开始计时,直到返回推理结果,期间的耗时。 无论是pt模型还是trt模型,结果都是正确的,只是trt模型的这个流程耗时较pt模型久一些
T4上也是如此吗?
这是Colab环境的推理速度,稍后我将直接推理pt模型
通过以上测试,我认为官方yolov7的pt模型和您的trt模型,推理时间相近
通过以上测试,我认为官方yolov7的pt模型和您的trt模型,推理时间相近
v7 的预处理流程和本仓库的不一致, 建议将预处理统一,重新测试 https://github.com/WongKinYiu/yolov7/blob/064c71e7c261172dd8d7250444c4f5375bebdc66/utils/datasets.py#L984
通过以上测试,我认为官方yolov7的pt模型和您的trt模型,推理时间相近
def preproc(image, input_size=(640, 640), mean=None, std=None, swap=(2, 0, 1)):
image = np.array(image, np.float32)
image = image[:, :, ::-1]
oh, ow = image.shape[:2]
dh, dw = input_size
scale = min(dw / ow, dh / oh)
M = np.array([
[scale, 0, 0],
[0, scale, 0]
])
padded_img = cv2.warpAffine(image, M, input_size)
padded_img /= 255.
if mean is not None:
padded_img -= mean
if std is not None:
padded_img /= std
padded_img = padded_img.transpose(swap)
padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
return padded_img, scale
期待您的测试, 建议您使用这个预处理方法重新测试, 如果有效的话,我们将在以后版本中使用该预处理方法
通过以上测试,我认为官方yolov7的pt模型和您的trt模型,推理时间相近
def preproc(image, input_size=(640, 640), mean=None, std=None, swap=(2, 0, 1)): image = np.array(image, np.float32) image = image[:, :, ::-1] oh, ow = image.shape[:2] dh, dw = input_size scale = min(dw / ow, dh / oh) M = np.array([ [scale, 0, 0], [0, scale, 0] ]) padded_img = cv2.warpAffine(image, M, input_size) padded_img /= 255. if mean is not None: padded_img -= mean if std is not None: padded_img /= std padded_img = padded_img.transpose(swap) padded_img = np.ascontiguousarray(padded_img, dtype=np.float32) return padded_img, scale
期待您的测试, 建议您使用这个预处理方法重新测试, 如果有效的话,我们将在以后版本中使用该预处理方法
感谢Linaom1214老师的耐心解答,这份新的代码结果看起来更糟 trt best.trt [08/22/2022-09:30:49] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 [08/22/2022-09:30:49] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 loading model...ok 202.96470545799218 FPS 4.929918999550864 ms 22.16935157775879 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
22.135257720947266 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
20.000696182250977 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
19.888877868652344 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
20.226716995239258 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
20.84493637084961 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
20.862102508544922 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
通过以上测试,我认为官方yolov7的pt模型和您的trt模型,推理时间相近
def preproc(image, input_size=(640, 640), mean=None, std=None, swap=(2, 0, 1)): image = np.array(image, np.float32) image = image[:, :, ::-1] oh, ow = image.shape[:2] dh, dw = input_size scale = min(dw / ow, dh / oh) M = np.array([ [scale, 0, 0], [0, scale, 0] ]) padded_img = cv2.warpAffine(image, M, input_size) padded_img /= 255. if mean is not None: padded_img -= mean if std is not None: padded_img /= std padded_img = padded_img.transpose(swap) padded_img = np.ascontiguousarray(padded_img, dtype=np.float32) return padded_img, scale
期待您的测试, 建议您使用这个预处理方法重新测试, 如果有效的话,我们将在以后版本中使用该预处理方法
感谢Linaom1214老师的耐心解答,这份新的代码结果看起来更糟 trt best.trt [08/22/2022-09:30:49] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 [08/22/2022-09:30:49] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 loading model...ok 202.96470545799218 FPS 4.929918999550864 ms 22.16935157775879 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
22.135257720947266 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
20.000696182250977 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
19.888877868652344 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
20.226716995239258 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
20.84493637084961 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
20.862102508544922 ms 0,293,320,154,115,0.9926884174346924 1,446,381,175,95,0.9879250526428223 1,460,296,166,82,0.9829330444335938
hhhhhhh, 等我们尝试一些更稳定的办法, 目前来看时间差异应该还是在数据预处理部分, pytorch 的数据加载和一些后处理都在GPU上实现, 我们的数据处理是完全基于CPU的, 还需要一段时间的优化。感谢您的测试数据
通过以上测试,我认为官方yolov7的pt模型和您的trt模型,推理时间相近
我刚刚意识到一个问题, V7是RePVGG啊
通过以上测试,我认为官方yolov7的pt模型和您的trt模型,推理时间相近
我刚刚意识到一个问题, V7是RePVGG啊
:)表示不懂,初学连个框架都不清楚
顺便一提,昨天,使用C++部署了一下,结果耗时久到吓人,直接用pt模型,通信截图识别耗时在15-31ms,使用trt模型耗时60-120ms,暂时不清楚哪里搞错,都是来自0延时循环高速请求识别
具体是哪个代码呢? 端到端代码吗?
顺便一提,昨天,使用C++部署了一下,结果耗时久到吓人,直接用pt模型,通信截图识别耗时在15-31ms,使用trt模型耗时60-120ms,暂时不清楚哪里搞错,都是来自0延时循环高速请求识别
可以尝试更稳定的V5
顺便一提,昨天,使用C++部署了一下,结果耗时久到吓人,直接用pt模型,通信截图识别耗时在15-31ms,使用trt模型耗时60-120ms,暂时不清楚哪里搞错,都是来自0延时循环高速请求识别
目前v7 使用onnx-> trt 精度也存在损失的问题
顺便一提,昨天,使用C++部署了一下,结果耗时久到吓人,直接用pt模型,通信截图识别耗时在15-31ms,使用trt模型耗时60-120ms,暂时不清楚哪里搞错,都是来自0延时循环高速请求识别
可以尝试更稳定的V5
好的,抽时间尝试一下,今天第一次使用Colab ,体验不错
顺便一提,昨天,使用C++部署了一下,结果耗时久到吓人,直接用pt模型,通信截图识别耗时在15-31ms,使用trt模型耗时60-120ms,暂时不清楚哪里搞错,都是来自0延时循环高速请求识别
C++ 这个可以详细说说吗?
顺便一提,昨天,使用C++部署了一下,结果耗时久到吓人,直接用pt模型,通信截图识别耗时在15-31ms,使用trt模型耗时60-120ms,暂时不清楚哪里搞错,都是来自0延时循环高速请求识别
C++ 这个可以详细说说吗?
end2end这个
顺便一提,昨天,使用C++部署了一下,结果耗时久到吓人,直接用pt模型,通信截图识别耗时在15-31ms,使用trt模型耗时60-120ms,暂时不清楚哪里搞错,都是来自0延时循环高速请求识别
C++ 这个可以详细说说吗?
写的是读取共享内存,C++编译出来的是个通信程序,通过发送WMCOPYDATA信息,得到数据长度,然后读取共享内存里的图片,进行识别,完成后将识别结果写到共享内存,关闭共享内存映射,返回结果的数据长度,另一个程序就可以读取共享内存获得结果
顺便一提,昨天,使用C++部署了一下,结果耗时久到吓人,直接用pt模型,通信截图识别耗时在15-31ms,使用trt模型耗时60-120ms,暂时不清楚哪里搞错,都是来自0延时循环高速请求识别
C++ 这个可以详细说说吗?
写的是读取共享内存,C++编译出来的是个通信程序,通过发送WMCOPYDATA信息,得到数据长度,然后读取共享内存里的图片,进行识别,完成后将识别结果写到共享内存,关闭共享内存映射,返回结果的数据长度,另一个程序就可以读取共享内存获得结果
可以用不包含nms的方式测试一下吗? 仓库提供的代码也都比较简单, 如果有什么BUG欢迎反馈
顺便一提,昨天,使用C++部署了一下,结果耗时久到吓人,直接用pt模型,通信截图识别耗时在15-31ms,使用trt模型耗时60-120ms,暂时不清楚哪里搞错,都是来自0延时循环高速请求识别
C++ 这个可以详细说说吗?
171ms 1 , 458 , 248 , 156 , 82 , 0.919060 , None 1 , 477 , 336 , 140 , 61 , 0.809349 , None 0 , 309 , 319 , 154 , 104 , 0.643329 , None 172ms 1 , 458 , 248 , 156 , 82 , 0.918909 , None 1 , 477 , 336 , 140 , 61 , 0.808587 , None 0 , 309 , 319 , 154 , 104 , 0.643854 , None 172ms 1 , 458 , 248 , 156 , 82 , 0.918983 , None 1 , 477 , 336 , 140 , 61 , 0.808988 , None 0 , 309 , 319 , 154 , 104 , 0.644179 , None 171ms 1 , 458 , 248 , 156 , 82 , 0.919059 , None 1 , 477 , 336 , 140 , 61 , 0.809353 , None 0 , 309 , 319 , 154 , 104 , 0.643329 , None 172ms 1 , 458 , 248 , 156 , 82 , 0.919059 , None 1 , 477 , 336 , 140 , 61 , 0.809353 , None 0 , 309 , 319 , 154 , 104 , 0.643329 , None 172ms 1 , 458 , 248 , 156 , 82 , 0.919059 , None 1 , 477 , 336 , 140 , 61 , 0.809353 , None 0 , 309 , 319 , 154 , 104 , 0.643329 , None
这个耗时是C++启动nms trt模型消耗的 同样的代码启动pt模型 耗时在10+ms
顺便一提,昨天,使用C++部署了一下,结果耗时久到吓人,直接用pt模型,通信截图识别耗时在15-31ms,使用trt模型耗时60-120ms,暂时不清楚哪里搞错,都是来自0延时循环高速请求识别
C++ 这个可以详细说说吗?
写的是读取共享内存,C++编译出来的是个通信程序,通过发送WMCOPYDATA信息,得到数据长度,然后读取共享内存里的图片,进行识别,完成后将识别结果写到共享内存,关闭共享内存映射,返回结果的数据长度,另一个程序就可以读取共享内存获得结果
可以用不包含nms的方式测试一下吗? 仓库提供的代码也都比较简单, 如果有什么BUG欢迎反馈
好的,这需要一些时间,缺少dirent.h文件,并且部分配置未完成
这也太夸张了,pt是用libtorch调用吗?
这也太夸张了,pt是用libtorch调用吗?
C++没有使用pt,是使用pyinstaller编译的一套大文件框架,使用的pytorh,这在多台计算机上移动不太方便,4G+文件空间
这也太夸张了,pt是用libtorch调用吗?
补充一下 我说的同样的C++代码,是因为之前使用C++编译了yolov4调用weights模型,使用的同样的代码,yolov4耗时在50ms内
单张图片推理 C++ end2end yolov6s
trtuser@0dee88e59c94:/workspace/TensorRT/TensorRT-For-YOLO-Series/TensorRT-For-YOLO-Series/end2end/build$ ./yolo -model_path ../../yolov6s.trt -image_path ../../src/1.jpg model size: 173714332 Registered plugin creator - ::GridAnchor_TRT version 1 Registered plugin creator - ::GridAnchorRect_TRT version 1 Registered plugin creator - ::NMS_TRT version 1 Registered plugin creator - ::Reorg_TRT version 1 Registered plugin creator - ::Region_TRT version 1 Registered plugin creator - ::Clip_TRT version 1 Registered plugin creator - ::LReLU_TRT version 1 Registered plugin creator - ::PriorBox_TRT version 1 Registered plugin creator - ::Normalize_TRT version 1 Registered plugin creator - ::ScatterND version 1 Registered plugin creator - ::RPROI_TRT version 1 Registered plugin creator - ::BatchedNMS_TRT version 1 Registered plugin creator - ::BatchedNMSDynamic_TRT version 1 Registered plugin creator - ::FlattenConcat_TRT version 1 Registered plugin creator - ::CropAndResize version 1 Registered plugin creator - ::DetectionLayer_TRT version 1 Registered plugin creator - ::EfficientNMS_TRT version 1 Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1 Registered plugin creator - ::EfficientNMS_TFTRT_TRT version 1 Registered plugin creator - ::Proposal version 1 Registered plugin creator - ::ProposalLayer_TRT version 1 Registered plugin creator - ::PyramidROIAlign_TRT version 1 Registered plugin creator - ::ResizeNearest_TRT version 1 Registered plugin creator - ::Split version 1 Registered plugin creator - ::SpecialSlice_TRT version 1 Registered plugin creator - ::InstanceNormalization_TRT version 1 Using cublas as a tactic source TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.6.1 Using cuDNN as a tactic source Deserialization required 505462 microseconds. Using cublas as a tactic source TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.6.1 Using cuDNN as a tactic source Total per-runner device persistent memory is 169579008 Total per-runner host persistent memory is 64960 Allocated activation device memory of size 26913280 7ms
你的v6好快,我的yolov7-tiny.trt end2end 直接使用您的C++文件编译出来,耗时20ms
你的v6好快,我的yolov7-tiny.trt end2end 直接使用您的C++文件编译出来,耗时20ms
v7 的end2end c++ 我还真没试过,我的TRT版本是8.2的 onnx模型有一个节点识别不了
engine init finished blob image [08/22/2022-20:08:39] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [08/22/2022-20:08:39] [W] [TRT] The enqueue() method has been deprecated when used with engines built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. Please use enqueueV2() instead. [08/22/2022-20:08:39] [W] [TRT] Also, the batchSize argument passed into this function has no effect on changing the input shapes. Please use setBindingDimensions() function to change input shapes instead. 15ms num of boxes before nms: 62 num of boxes: 6 0 = 0.90573 at 53.16 398.86 189.65 x 500.30 5 = 0.90219 at 13.93 234.68 770.86 x 508.74 0 = 0.89119 at 220.63 412.31 128.91 x 446.84 0 = 0.88738 at 666.77 394.24 142.23 x 481.20 0 = 0.61789 at 0.00 558.58 75.63 x 327.18 11 = 0.23620 at 0.41 252.30 33.84 x 71.59 save vis file yolo destroy
yolov7-tiny.trt normal竟然比end2end更快
yolo.hpp开头需要增加#define NOMINMAX 以及代码中的363-364行改为如下 const char INPUT_BLOB_NAME = "images";//image_arrays const char OUTPUT_BLOB_NAME = "output";
还有自己新建一个dirent.h文件
dirent.h文件内容
/*
`*` Dirent interface for Microsoft Visual Studio
*
* Copyright (C) 1998-2019 Toni Ronkko
* This file is part of dirent. Dirent may be freely distributed
* under the MIT license. For all details and documentation, see
* https://github.com/tronkko/dirent
*/
#define DIRENT_H
/* Hide warnings about unreferenced local functions */
#if defined(__clang__)
# pragma clang diagnostic ignored "-Wunused-function"
#elif defined(_MSC_VER)
# pragma warning(disable:4505)
#elif defined(__GNUC__)
# pragma GCC diagnostic ignored "-Wunused-function"
#endif
/*
* Include windows.h without Windows Sockets 1.1 to prevent conflicts with
* Windows Sockets 2.0.
*/
#ifndef WIN32_LEAN_AND_MEAN
# define WIN32_LEAN_AND_MEAN
#endif
#include <windows.h>
#include <stdio.h>
#include <stdarg.h>
#include <wchar.h>
#include <string.h>
#include <stdlib.h>
#include <malloc.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
/* Indicates that d_type field is available in dirent structure */
#define _DIRENT_HAVE_D_TYPE
#define _DIRENT_HAVE_D_NAMLEN
/* Entries missing from MSVC 6.0 */
#if !defined(FILE_ATTRIBUTE_DEVICE)
# define FILE_ATTRIBUTE_DEVICE 0x40
#endif
/* File type and permission flags for stat(), general mask */
#if !defined(S_IFMT)
# define S_IFMT _S_IFMT
#endif
/* Directory bit */
#if !defined(S_IFDIR)
# define S_IFDIR _S_IFDIR
#endif
/* Character device bit */
#if !defined(S_IFCHR)
# define S_IFCHR _S_IFCHR
#endif
/* Pipe bit */
#if !defined(S_IFFIFO)
# define S_IFFIFO _S_IFFIFO
#endif
/* Regular file bit */
#if !defined(S_IFREG)
# define S_IFREG _S_IFREG
#endif
/* Read permission */
#if !defined(S_IREAD)
# define S_IREAD _S_IREAD
#endif
/* Write permission */
#if !defined(S_IWRITE)
# define S_IWRITE _S_IWRITE
#endif
/* Execute permission */
#if !defined(S_IEXEC)
# define S_IEXEC _S_IEXEC
#endif
/* Pipe */
#if !defined(S_IFIFO)
# define S_IFIFO _S_IFIFO
#endif
/* Block device */
#if !defined(S_IFBLK)
# define S_IFBLK 0
#endif
/* Link */
#if !defined(S_IFLNK)
# define S_IFLNK 0
#endif
/* Socket */
#if !defined(S_IFSOCK)
# define S_IFSOCK 0
#endif
/* Read user permission */
#if !defined(S_IRUSR)
# define S_IRUSR S_IREAD
#endif
/* Write user permission */
#if !defined(S_IWUSR)
# define S_IWUSR S_IWRITE
#endif
/* Execute user permission */
#if !defined(S_IXUSR)
# define S_IXUSR 0
#endif
/* Read group permission */
#if !defined(S_IRGRP)
# define S_IRGRP 0
#endif
/* Write group permission */
#if !defined(S_IWGRP)
# define S_IWGRP 0
#endif
/* Execute group permission */
#if !defined(S_IXGRP)
# define S_IXGRP 0
#endif
/* Read others permission */
#if !defined(S_IROTH)
# define S_IROTH 0
#endif
/* Write others permission */
#if !defined(S_IWOTH)
# define S_IWOTH 0
#endif
/* Execute others permission */
#if !defined(S_IXOTH)
# define S_IXOTH 0
#endif
/* Maximum length of file name */
#if !defined(PATH_MAX)
# define PATH_MAX MAX_PATH
#endif
#if !defined(FILENAME_MAX)
# define FILENAME_MAX MAX_PATH
#endif
#if !defined(NAME_MAX)
# define NAME_MAX FILENAME_MAX
#endif
/* File type flags for d_type */
#define DT_UNKNOWN 0
#define DT_REG S_IFREG
#define DT_DIR S_IFDIR
#define DT_FIFO S_IFIFO
#define DT_SOCK S_IFSOCK
#define DT_CHR S_IFCHR
#define DT_BLK S_IFBLK
#define DT_LNK S_IFLNK
/* Macros for converting between st_mode and d_type */
#define IFTODT(mode) ((mode) & S_IFMT)
#define DTTOIF(type) (type)
/*
* File type macros. Note that block devices, sockets and links cannot be
* distinguished on Windows and the macros S_ISBLK, S_ISSOCK and S_ISLNK are
* only defined for compatibility. These macros should always return false
* on Windows.
*/
#if !defined(S_ISFIFO)
# define S_ISFIFO(mode) (((mode) & S_IFMT) == S_IFIFO)
#endif
#if !defined(S_ISDIR)
# define S_ISDIR(mode) (((mode) & S_IFMT) == S_IFDIR)
#endif
#if !defined(S_ISREG)
# define S_ISREG(mode) (((mode) & S_IFMT) == S_IFREG)
#endif
#if !defined(S_ISLNK)
# define S_ISLNK(mode) (((mode) & S_IFMT) == S_IFLNK)
#endif
#if !defined(S_ISSOCK)
# define S_ISSOCK(mode) (((mode) & S_IFMT) == S_IFSOCK)
#endif
#if !defined(S_ISCHR)
# define S_ISCHR(mode) (((mode) & S_IFMT) == S_IFCHR)
#endif
#if !defined(S_ISBLK)
# define S_ISBLK(mode) (((mode) & S_IFMT) == S_IFBLK)
#endif
/* Return the exact length of the file name without zero terminator */
#define _D_EXACT_NAMLEN(p) ((p)->d_namlen)
/* Return the maximum size of a file name */
#define _D_ALLOC_NAMLEN(p) ((PATH_MAX)+1)
#ifdef __cplusplus
extern "C" {
#endif
/* Wide-character version */
struct _wdirent {
/* Always zero */
long d_ino;
/* File position within stream */
long d_off;
/* Structure size */
unsigned short d_reclen;
/* Length of name without \0 */
size_t d_namlen;
/* File type */
int d_type;
/* File name */
wchar_t d_name[PATH_MAX+1];
};
typedef struct _wdirent _wdirent;
struct _WDIR {
/* Current directory entry */
struct _wdirent ent;
/* Private file data */
WIN32_FIND_DATAW data;
/* True if data is valid */
int cached;
/* Win32 search handle */
HANDLE handle;
/* Initial directory name */
wchar_t *patt;
};
typedef struct _WDIR _WDIR;
/* Multi-byte character version */
struct dirent {
/* Always zero */
long d_ino;
/* File position within stream */
long d_off;
/* Structure size */
unsigned short d_reclen;
/* Length of name without \0 */
size_t d_namlen;
/* File type */
int d_type;
/* File name */
char d_name[PATH_MAX+1];
};
typedef struct dirent dirent;
struct DIR {
struct dirent ent;
struct _WDIR *wdirp;
};
typedef struct DIR DIR;
/* Dirent functions */
static DIR *opendir (const char *dirname);
static _WDIR *_wopendir (const wchar_t *dirname);
static struct dirent *readdir (DIR *dirp);
static struct _wdirent *_wreaddir (_WDIR *dirp);
static int readdir_r(
DIR *dirp, struct dirent *entry, struct dirent **result);
static int _wreaddir_r(
_WDIR *dirp, struct _wdirent *entry, struct _wdirent **result);
static int closedir (DIR *dirp);
static int _wclosedir (_WDIR *dirp);
static void rewinddir (DIR* dirp);
static void _wrewinddir (_WDIR* dirp);
static int scandir (const char *dirname, struct dirent ***namelist,
int (*filter)(const struct dirent*),
int (*compare)(const struct dirent**, const struct dirent**));
static int alphasort (const struct dirent **a, const struct dirent **b);
static int versionsort (const struct dirent **a, const struct dirent **b);
/* For compatibility with Symbian */
#define wdirent _wdirent
#define WDIR _WDIR
#define wopendir _wopendir
#define wreaddir _wreaddir
#define wclosedir _wclosedir
#define wrewinddir _wrewinddir
/* Internal utility functions */
static WIN32_FIND_DATAW *dirent_first (_WDIR *dirp);
static WIN32_FIND_DATAW *dirent_next (_WDIR *dirp);
static int dirent_mbstowcs_s(
size_t *pReturnValue,
wchar_t *wcstr,
size_t sizeInWords,
const char *mbstr,
size_t count);
static int dirent_wcstombs_s(
size_t *pReturnValue,
char *mbstr,
size_t sizeInBytes,
const wchar_t *wcstr,
size_t count);
static void dirent_set_errno (int error);
/*
* Open directory stream DIRNAME for read and return a pointer to the
* internal working area that is used to retrieve individual directory
* entries.
*/
static _WDIR*
_wopendir(
const wchar_t *dirname)
{
_WDIR *dirp;
#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
/* Desktop */
DWORD n;
#else
/* WinRT */
size_t n;
#endif
wchar_t *p;
/* Must have directory name */
if (dirname == NULL || dirname[0] == '\0') {
dirent_set_errno (ENOENT);
return NULL;
}
/* Allocate new _WDIR structure */
dirp = (_WDIR*) malloc (sizeof (struct _WDIR));
if (!dirp) {
return NULL;
}
/* Reset _WDIR structure */
dirp->handle = INVALID_HANDLE_VALUE;
dirp->patt = NULL;
dirp->cached = 0;
/*
* Compute the length of full path plus zero terminator
*
* Note that on WinRT there's no way to convert relative paths
* into absolute paths, so just assume it is an absolute path.
*/
#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
/* Desktop */
n = GetFullPathNameW (dirname, 0, NULL, NULL);
#else
/* WinRT */
n = wcslen (dirname);
#endif
/* Allocate room for absolute directory name and search pattern */
dirp->patt = (wchar_t*) malloc (sizeof (wchar_t) * n + 16);
if (dirp->patt == NULL) {
goto exit_closedir;
}
/*
* Convert relative directory name to an absolute one. This
* allows rewinddir() to function correctly even when current
* working directory is changed between opendir() and rewinddir().
*
* Note that on WinRT there's no way to convert relative paths
* into absolute paths, so just assume it is an absolute path.
*/
#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
/* Desktop */
n = GetFullPathNameW (dirname, n, dirp->patt, NULL);
if (n <= 0) {
goto exit_closedir;
}
#else
/* WinRT */
wcsncpy_s (dirp->patt, n+1, dirname, n);
#endif
/* Append search pattern \* to the directory name */
p = dirp->patt + n;
switch (p[-1]) {
case '\\':
case '/':
case ':':
/* Directory ends in path separator, e.g. c:\temp\ */
/*NOP*/;
break;
default:
/* Directory name doesn't end in path separator */
*p++ = '\\';
}
*p++ = '*';
*p = '\0';
/* Open directory stream and retrieve the first entry */
if (!dirent_first (dirp)) {
goto exit_closedir;
}
/* Success */
return dirp;
/* Failure */
exit_closedir:
_wclosedir (dirp);
return NULL;
}
/*
* Read next directory entry.
*
* Returns pointer to static directory entry which may be overwritten by
* subsequent calls to _wreaddir().
*/
static struct _wdirent*
_wreaddir(
_WDIR *dirp)
{
struct _wdirent *entry;
/*
* Read directory entry to buffer. We can safely ignore the return value
* as entry will be set to NULL in case of error.
*/
(void) _wreaddir_r (dirp, &dirp->ent, &entry);
/* Return pointer to statically allocated directory entry */
return entry;
}
/*
* Read next directory entry.
*
* Returns zero on success. If end of directory stream is reached, then sets
* result to NULL and returns zero.
*/
static int
_wreaddir_r(
_WDIR *dirp,
struct _wdirent *entry,
struct _wdirent **result)
{
WIN32_FIND_DATAW *datap;
/* Read next directory entry */
datap = dirent_next (dirp);
if (datap) {
size_t n;
DWORD attr;
/*
* Copy file name as wide-character string. If the file name is too
* long to fit in to the destination buffer, then truncate file name
* to PATH_MAX characters and zero-terminate the buffer.
*/
n = 0;
while (n < PATH_MAX && datap->cFileName[n] != 0) {
entry->d_name[n] = datap->cFileName[n];
n++;
}
entry->d_name[n] = 0;
/* Length of file name excluding zero terminator */
entry->d_namlen = n;
/* File type */
attr = datap->dwFileAttributes;
if ((attr & FILE_ATTRIBUTE_DEVICE) != 0) {
entry->d_type = DT_CHR;
} else if ((attr & FILE_ATTRIBUTE_DIRECTORY) != 0) {
entry->d_type = DT_DIR;
} else {
entry->d_type = DT_REG;
}
/* Reset dummy fields */
entry->d_ino = 0;
entry->d_off = 0;
entry->d_reclen = sizeof (struct _wdirent);
/* Set result address */
*result = entry;
} else {
/* Return NULL to indicate end of directory */
*result = NULL;
}
return /*OK*/0;
}
/*
* Close directory stream opened by opendir() function. This invalidates the
* DIR structure as well as any directory entry read previously by
* _wreaddir().
*/
static int
_wclosedir(
_WDIR *dirp)
{
int ok;
if (dirp) {
/* Release search handle */
if (dirp->handle != INVALID_HANDLE_VALUE) {
FindClose (dirp->handle);
}
/* Release search pattern */
free (dirp->patt);
/* Release directory structure */
free (dirp);
ok = /*success*/0;
} else {
/* Invalid directory stream */
dirent_set_errno (EBADF);
ok = /*failure*/-1;
}
return ok;
}
/*
* Rewind directory stream such that _wreaddir() returns the very first
* file name again.
*/
static void
_wrewinddir(
_WDIR* dirp)
{
if (dirp) {
/* Release existing search handle */
if (dirp->handle != INVALID_HANDLE_VALUE) {
FindClose (dirp->handle);
}
/* Open new search handle */
dirent_first (dirp);
}
}
/* Get first directory entry (internal) */
static WIN32_FIND_DATAW*
dirent_first(
_WDIR *dirp)
{
WIN32_FIND_DATAW *datap;
DWORD error;
/* Open directory and retrieve the first entry */
dirp->handle = FindFirstFileExW(
dirp->patt, FindExInfoStandard, &dirp->data,
FindExSearchNameMatch, NULL, 0);
if (dirp->handle != INVALID_HANDLE_VALUE) {
/* a directory entry is now waiting in memory */
datap = &dirp->data;
dirp->cached = 1;
} else {
/* Failed to open directory: no directory entry in memory */
dirp->cached = 0;
datap = NULL;
/* Set error code */
error = GetLastError ();
switch (error) {
case ERROR_ACCESS_DENIED:
/* No read access to directory */
dirent_set_errno (EACCES);
break;
case ERROR_DIRECTORY:
/* Directory name is invalid */
dirent_set_errno (ENOTDIR);
break;
case ERROR_PATH_NOT_FOUND:
default:
/* Cannot find the file */
dirent_set_errno (ENOENT);
}
}
return datap;
}
/*
* Get next directory entry (internal).
*
* Returns
*/
static WIN32_FIND_DATAW*
dirent_next(
_WDIR *dirp)
{
WIN32_FIND_DATAW *p;
/* Get next directory entry */
if (dirp->cached != 0) {
/* A valid directory entry already in memory */
p = &dirp->data;
dirp->cached = 0;
} else if (dirp->handle != INVALID_HANDLE_VALUE) {
/* Get the next directory entry from stream */
if (FindNextFileW (dirp->handle, &dirp->data) != FALSE) {
/* Got a file */
p = &dirp->data;
} else {
/* The very last entry has been processed or an error occurred */
FindClose (dirp->handle);
dirp->handle = INVALID_HANDLE_VALUE;
p = NULL;
}
} else {
/* End of directory stream reached */
p = NULL;
}
return p;
}
/*
* Open directory stream using plain old C-string.
*/
static DIR*
opendir(
const char *dirname)
{
struct DIR *dirp;
/* Must have directory name */
if (dirname == NULL || dirname[0] == '\0') {
dirent_set_errno (ENOENT);
return NULL;
}
/* Allocate memory for DIR structure */
dirp = (DIR*) malloc (sizeof (struct DIR));
if (!dirp) {
return NULL;
}
{
int error;
wchar_t wname[PATH_MAX + 1];
size_t n;
/* Convert directory name to wide-character string */
error = dirent_mbstowcs_s(
&n, wname, PATH_MAX + 1, dirname, PATH_MAX + 1);
if (error) {
/*
* Cannot convert file name to wide-character string. This
* occurs if the string contains invalid multi-byte sequences or
* the output buffer is too small to contain the resulting
* string.
*/
goto exit_free;
}
/* Open directory stream using wide-character name */
dirp->wdirp = _wopendir (wname);
if (!dirp->wdirp) {
goto exit_free;
}
}
/* Success */
return dirp;
/* Failure */
exit_free:
free (dirp);
return NULL;
}
/*
* Read next directory entry.
*/
static struct dirent*
readdir(
DIR *dirp)
{
struct dirent *entry;
/*
* Read directory entry to buffer. We can safely ignore the return value
* as entry will be set to NULL in case of error.
*/
(void) readdir_r (dirp, &dirp->ent, &entry);
/* Return pointer to statically allocated directory entry */
return entry;
}
/*
* Read next directory entry into called-allocated buffer.
*
* Returns zero on success. If the end of directory stream is reached, then
* sets result to NULL and returns zero.
*/
static int
readdir_r(
DIR *dirp,
struct dirent *entry,
struct dirent **result)
{
WIN32_FIND_DATAW *datap;
/* Read next directory entry */
datap = dirent_next (dirp->wdirp);
if (datap) {
size_t n;
int error;
/* Attempt to convert file name to multi-byte string */
error = dirent_wcstombs_s(
&n, entry->d_name, PATH_MAX + 1, datap->cFileName, PATH_MAX + 1);
/*
* If the file name cannot be represented by a multi-byte string,
* then attempt to use old 8+3 file name. This allows traditional
* Unix-code to access some file names despite of unicode
* characters, although file names may seem unfamiliar to the user.
*
* Be ware that the code below cannot come up with a short file
* name unless the file system provides one. At least
* VirtualBox shared folders fail to do this.
*/
if (error && datap->cAlternateFileName[0] != '\0') {
error = dirent_wcstombs_s(
&n, entry->d_name, PATH_MAX + 1,
datap->cAlternateFileName, PATH_MAX + 1);
}
if (!error) {
DWORD attr;
/* Length of file name excluding zero terminator */
entry->d_namlen = n - 1;
/* File attributes */
attr = datap->dwFileAttributes;
if ((attr & FILE_ATTRIBUTE_DEVICE) != 0) {
entry->d_type = DT_CHR;
} else if ((attr & FILE_ATTRIBUTE_DIRECTORY) != 0) {
entry->d_type = DT_DIR;
} else {
entry->d_type = DT_REG;
}
/* Reset dummy fields */
entry->d_ino = 0;
entry->d_off = 0;
entry->d_reclen = sizeof (struct dirent);
} else {
/*
* Cannot convert file name to multi-byte string so construct
* an erroneous directory entry and return that. Note that
* we cannot return NULL as that would stop the processing
* of directory entries completely.
*/
entry->d_name[0] = '?';
entry->d_name[1] = '\0';
entry->d_namlen = 1;
entry->d_type = DT_UNKNOWN;
entry->d_ino = 0;
entry->d_off = -1;
entry->d_reclen = 0;
}
/* Return pointer to directory entry */
*result = entry;
} else {
/* No more directory entries */
*result = NULL;
}
return /*OK*/0;
}
/*
* Close directory stream.
*/
static int
closedir(
DIR *dirp)
{
int ok;
if (dirp) {
/* Close wide-character directory stream */
ok = _wclosedir (dirp->wdirp);
dirp->wdirp = NULL;
/* Release multi-byte character version */
free (dirp);
} else {
/* Invalid directory stream */
dirent_set_errno (EBADF);
ok = /*failure*/-1;
}
return ok;
}
/*
* Rewind directory stream to beginning.
*/
static void
rewinddir(
DIR* dirp)
{
/* Rewind wide-character string directory stream */
_wrewinddir (dirp->wdirp);
}
/*
* Scan directory for entries.
*/
static int
scandir(
const char *dirname,
struct dirent ***namelist,
int (*filter)(const struct dirent*),
int (*compare)(const struct dirent**, const struct dirent**))
{
struct dirent **files = NULL;
size_t size = 0;
size_t allocated = 0;
const size_t init_size = 1;
DIR *dir = NULL;
struct dirent *entry;
struct dirent *tmp = NULL;
size_t i;
int result = 0;
/* Open directory stream */
dir = opendir (dirname);
if (dir) {
/* Read directory entries to memory */
while (1) {
/* Enlarge pointer table to make room for another pointer */
if (size >= allocated) {
void *p;
size_t num_entries;
/* Compute number of entries in the enlarged pointer table */
if (size < init_size) {
/* Allocate initial pointer table */
num_entries = init_size;
} else {
/* Double the size */
num_entries = size * 2;
}
/* Allocate first pointer table or enlarge existing table */
p = realloc (files, sizeof (void*) * num_entries);
if (p != NULL) {
/* Got the memory */
files = (dirent**) p;
allocated = num_entries;
} else {
/* Out of memory */
result = -1;
break;
}
}
/* Allocate room for temporary directory entry */
if (tmp == NULL) {
tmp = (struct dirent*) malloc (sizeof (struct dirent));
if (tmp == NULL) {
/* Cannot allocate temporary directory entry */
result = -1;
break;
}
}
/* Read directory entry to temporary area */
if (readdir_r (dir, tmp, &entry) == /*OK*/0) {
/* Did we get an entry? */
if (entry != NULL) {
int pass;
/* Determine whether to include the entry in result */
if (filter) {
/* Let the filter function decide */
pass = filter (tmp);
} else {
/* No filter function, include everything */
pass = 1;
}
if (pass) {
/* Store the temporary entry to pointer table */
files[size++] = tmp;
tmp = NULL;
/* Keep up with the number of files */
result++;
}
} else {
/*
* End of directory stream reached => sort entries and
* exit.
*/
qsort (files, size, sizeof (void*),
(int (*) (const void*, const void*)) compare);
break;
}
} else {
/* Error reading directory entry */
result = /*Error*/ -1;
break;
}
}
} else {
/* Cannot open directory */
result = /*Error*/ -1;
}
/* Release temporary directory entry */
free (tmp);
/* Release allocated memory on error */
if (result < 0) {
for (i = 0; i < size; i++) {
free (files[i]);
}
free (files);
files = NULL;
}
/* Close directory stream */
if (dir) {
closedir (dir);
}
/* Pass pointer table to caller */
if (namelist) {
*namelist = files;
}
return result;
}
/* Alphabetical sorting */
static int
alphasort(
const struct dirent **a, const struct dirent **b)
{
return strcoll ((*a)->d_name, (*b)->d_name);
}
/* Sort versions */
static int
versionsort(
const struct dirent **a, const struct dirent **b)
{
/* FIXME: implement strverscmp and use that */
return alphasort (a, b);
}
/* Convert multi-byte string to wide character string */
static int
dirent_mbstowcs_s(
size_t *pReturnValue,
wchar_t *wcstr,
size_t sizeInWords,
const char *mbstr,
size_t count)
{
int error;
#if defined(_MSC_VER) && _MSC_VER >= 1400
/* Microsoft Visual Studio 2005 or later */
error = mbstowcs_s (pReturnValue, wcstr, sizeInWords, mbstr, count);
#else
/* Older Visual Studio or non-Microsoft compiler */
size_t n;
/* Convert to wide-character string (or count characters) */
n = mbstowcs (wcstr, mbstr, sizeInWords);
if (!wcstr || n < count) {
/* Zero-terminate output buffer */
if (wcstr && sizeInWords) {
if (n >= sizeInWords) {
n = sizeInWords - 1;
}
wcstr[n] = 0;
}
/* Length of resulting multi-byte string WITH zero terminator */
if (pReturnValue) {
*pReturnValue = n + 1;
}
/* Success */
error = 0;
} else {
/* Could not convert string */
error = 1;
}
#endif
return error;
}
/* Convert wide-character string to multi-byte string */
static int
dirent_wcstombs_s(
size_t *pReturnValue,
char *mbstr,
size_t sizeInBytes, /* max size of mbstr */
const wchar_t *wcstr,
size_t count)
{
int error;
#if defined(_MSC_VER) && _MSC_VER >= 1400
/* Microsoft Visual Studio 2005 or later */
error = wcstombs_s (pReturnValue, mbstr, sizeInBytes, wcstr, count);
#else
/* Older Visual Studio or non-Microsoft compiler */
size_t n;
/* Convert to multi-byte string (or count the number of bytes needed) */
n = wcstombs (mbstr, wcstr, sizeInBytes);
if (!mbstr || n < count) {
/* Zero-terminate output buffer */
if (mbstr && sizeInBytes) {
if (n >= sizeInBytes) {
n = sizeInBytes - 1;
}
mbstr[n] = '\0';
}
/* Length of resulting multi-bytes string WITH zero-terminator */
if (pReturnValue) {
*pReturnValue = n + 1;
}
/* Success */
error = 0;
} else {
/* Cannot convert string */
error = 1;
}
#endif
return error;
}
/* Set errno variable */
static void
dirent_set_errno(
int error)
{
#if defined(_MSC_VER) && _MSC_VER >= 1400
/* Microsoft Visual Studio 2005 and later */
_set_errno (error);
#else
/* Non-Microsoft compiler or older Microsoft compiler */
errno = error;
#endif
}
#ifdef __cplusplus
}
#endif
Do you know why it takes only 9 milliseconds to infer using Pt model, but 20 milliseconds to infer using TRT model? They have already warmed up 10 times. If so, tensorrt does not seem to accelerate. Maybe there is a configuration error