有关yolov4-tiny chunk部分的代码

beizhengren commented 3 years ago

@enazoe 作者您好, 关于yolov4-tiny部分, 我想把IPluginV2IOExt 换成IPluginV2, 然后支持tensorrt5. 如下: trt7:

    class Chunk : public IPluginV2IOExt{...}

trt5:

    class Chunk : public IPluginV2{...}

请问这样换完之后(版本暂且成为trt5), 是不是需要用trt5生成新的engine之后, 才能做推断. trt5直接加载原始的trt7的已经转好的模型推断会不会有问题呢? 谢谢!

enazoe commented 3 years ago

@beizhengren 应该会有问题，你改过重新build下就好

beizhengren commented 3 years ago

@enazoe 好嘞, 我先试试

beizhengren commented 3 years ago

@enazoe 搞定了,谢谢!

beizhengren commented 3 years ago

@enazoe 作者您好, 请问在创建half的engine的时候相比float的engine创建有啥特别的操作吗? 我把接口改成IPluginV2之后, 可以成功创建float engine, 但是创建 half 的engine的时候会执行下面的语句报错: https://github.com/enazoe/yolo-tensorrt/blob/cc405b07af4351334e7b5f47dc70bec73ab25a06/modules/chunk.cu#L68 最大的改动就是把 void Chunk::configurePlugin 中的代码移到了 configureWithFormat中. 完整的chunk_V2.cpp如下:

Click to expand

```cpp #include #include #include #include #include "chunk_V2.h" #include #define ASSERT(assertion) \ { \ if (!(assertion)) \ { \ std::cout<<"ASSERTION FAILED in " \ <<__FILE__<<":"<<__LINE__ \ <(buffer); } ChunkV2::~ChunkV2() { } int ChunkV2::getNbOutputs() const { return 2; } Dims ChunkV2::getOutputDimensions(int index, const Dims* inputs, int nbInputDims) { assert(nbInputDims == 1); assert(index == 0 || index == 1); return Dims3(inputs[0].d[0] / 2, inputs[0].d[1], inputs[0].d[2]); } int ChunkV2::initialize() { return 0; } void ChunkV2::terminate() { } size_t ChunkV2::getWorkspaceSize(int maxBatchSize) const { return 0; } size_t ChunkV2::getSerializationSize() const { return sizeof(_n_size_split); } void ChunkV2::serialize(void *buffer)const { *reinterpret_cast(buffer) = _n_size_split; } bool ChunkV2::supportsFormat(DataType type, PluginFormat format) const { return ((type == DataType::kFLOAT || type == DataType::kHALF || type == DataType::kINT8) && (format == PluginFormat::kNCHW)); } // Set plugin namespace void ChunkV2::setPluginNamespace(const char* pluginNamespace) { _s_plugin_namespace = pluginNamespace; } const char* ChunkV2::getPluginNamespace() const { return _s_plugin_namespace.c_str(); } // Configure the layer with input and output data types. void ChunkV2::configureWithFormat( const Dims* inputDims, int nbInputs, const Dims* outputDims, int nbOutputs, DataType type, PluginFormat format, int maxBatchSize){ _n_size_split = inputDims->d[0] / 2 * inputDims->d[1] * inputDims->d[2] *sizeof(float); std::cerr << _n_size_split << std::endl; ASSERT(format == PluginFormat::kNCHW); ASSERT(type == DataType::kFLOAT || type == DataType::kHALF); //mDataType = type; ASSERT(inputDims[0].nbDims >= 1); // number of dimensions of the input tensor must be >=1 } const char* ChunkV2::getPluginType()const { return "CHUNK_TRT"; } const char* ChunkV2::getPluginVersion() const { return "1.0"; } void ChunkV2::destroy() { delete this; } // Clone the plugin IPluginV2* ChunkV2::clone() const { ChunkV2 *p = new ChunkV2(); p->_n_size_split = _n_size_split; p->setPluginNamespace(_s_plugin_namespace.c_str()); return p; } int ChunkV2::enqueue(int batchSize, const void* const* inputs, void** outputs, void* workspace, cudaStream_t stream) { for (int b = 0; b < batchSize; ++b) { NV_CUDA_CHECK(cudaMemcpy((char*)outputs[0] + b * _n_size_split, (char*)inputs[0] + b * 2 * _n_size_split, _n_size_split, cudaMemcpyDeviceToDevice)); NV_CUDA_CHECK(cudaMemcpy((char*)outputs[1] + b * _n_size_split, (char*)inputs[0] + b * 2 * _n_size_split + _n_size_split, _n_size_split, cudaMemcpyDeviceToDevice)); } return 0; } PluginFieldCollection ChunkV2PluginCreator::_fc{}; std::vector ChunkV2PluginCreator::_vec_plugin_attributes; ChunkV2PluginCreator::ChunkV2PluginCreator() { _vec_plugin_attributes.clear(); _fc.nbFields = _vec_plugin_attributes.size(); _fc.fields = _vec_plugin_attributes.data(); } const char* ChunkV2PluginCreator::getPluginName() const { return "CHUNK_TRT"; } const char* ChunkV2PluginCreator::getPluginVersion() const { return "1.0"; } const PluginFieldCollection* ChunkV2PluginCreator::getFieldNames() { return &_fc; } IPluginV2* ChunkV2PluginCreator::createPlugin(const char* name, const PluginFieldCollection* fc) { ChunkV2* obj = new ChunkV2(); obj->setPluginNamespace(_s_name_space.c_str()); return obj; } IPluginV2* ChunkV2PluginCreator::deserializePlugin(const char* name, const void* serialData, size_t serialLength) { ChunkV2* obj = new ChunkV2(serialData,serialLength); obj->setPluginNamespace(_s_name_space.c_str()); return obj; } void ChunkV2PluginCreator::setPluginNamespace(const char* libNamespace) { _s_name_space = libNamespace; } const char* ChunkV2PluginCreator::getPluginNamespace() const { return _s_name_space.c_str(); } REGISTER_TENSORRT_PLUGIN(ChunkV2PluginCreator); }//namespace nvinfer1 ```

非常感谢!

enazoe commented 3 years ago

@beizhengren 额，这个我不太清楚，fp16精度不用特意实现吧，fp32实现，序列化的时候会自动设成fp16

beizhengren commented 3 years ago

@enazoe 哦哦哦, 这样啊. 那int8应该和fp16的实现过程差不多吧? 只是多了一步setInt8Calibrator

enazoe commented 3 years ago

@beizhengren 是的，目前来看是这样的

beizhengren commented 3 years ago

@enazoe 非常感谢!

beizhengren commented 3 years ago

@enazoe 还得请教下~ 为了兼容trt5, 我在这里用了setFp16Mode(true) https://github.com/enazoe/yolo-tensorrt/blob/1110190047815826381c088ba1c79a7b034acbdd/modules/yolo.cpp#L462

创建 half 的engine的时候会执行下面的语句报错 https://github.com/enazoe/yolo-tensorrt/blob/cc405b07af4351334e7b5f47dc70bec73ab25a06/modules/chunk.cu#L68 您有空能试一下吗?我尝试了很多办法,没有搞定. 非常感谢!

seungtaek94 commented 2 years ago

@beizhengren @enazoe

Hi. Anyone can explain this issue as English ?

Actually, I have same problem in below line :(

https://github.com/enazoe/yolo-tensorrt/blob/cc405b07af4351334e7b5f47dc70bec73ab25a06/modules/chunk.cu#L68

My env;

cuda - 11.3
tensorrt - 8.0.1.6
opencv - 4.2.0
vs2019
windows10

enazoe / yolo-tensorrt

有关yolov4-tiny chunk部分的代码 #51