loading and doing inference float32 in c++ API are working but loading the serialized int8/float16 model in c++ API is not working

joberzheng commented 3 years ago

Firstly, thanks for this project that is of high quality.

I converte my model with torch2trt in code: ... model_trt_float32 = torch2trt( my_model,[ims],max_batch_size=32);

model_trt_float16 = torch2trt( my_model,[ims],fp16_mode=True,max_batch_size=32);

model_trt_int8 = torch2trt( my_model,[ims],int_mode=True,max_batch_size=32,int8_calib_batch_size=32);

with("model_float32.trt","wb") as f: f.write(model_trt_float32.engine.serialize() )

with("model_float16.trt","wb") as f: f.write(model_trt_float16.engine.serialize() )

with("model_int8.trt","wb") as f: f.write(model_trt_int8.engine.serialize() ) ... I do inference with model_trt_float32 、model_trt_float16 、model_trt_int8 in python ,they are ok! Then, I am to load and do inference in C++ API with model_float32.trt, they are work , However, I am to the same load do inference in C++ API with model_float16.trt/model_int8.trt , when load model , the code gives out an error of the follow: **“ [TensorRT ]: ../rtSafe/cuda/caskConvolutionRunner.cpp(89)- Cask Error in caskConvolutionRunner:0 (findByHande) 1

TensorRT: INVALID_CONFIG:Deserialize the cuda engine failed .1”**

Thanks and I look forward to the support.

github2016-yuan commented 3 years ago

@xinxiangzheng Hello, I want to ask something to you (despite I know a little about how to solve your problem,Sorry...) I also convet my model to trt file with this repo torch2trt. But when I load it with C++, I get error:

[TensorRT] ERROR: ../rtSafe/coreReadArchive.cpp (31) - Serialization Error in verifyHeader: 0 (Magic tag does not match)
[TensorRT] ERROR: INVALID_STATE: std::exception
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.

After some searching, maybe the reason is python trt version and C++ trt version is different. I check my C++ tensorrt version, it is 7.1.3 and my python tensorrt version is also 7.1.3.0

So do you run into the problem or do would you mind give me some advice? Best regards.

joberzheng commented 3 years ago

@ github2016-yuan Thank you for your reply！

I didn't encounter this problem ,but your problem was due to 2 issues : Wrong inputs to th model in terms of input size. Loading multiple runtimes at once in the memory. You should load the runtime only once and deserialize multiple models in that runtime only instead of loading different runtime for every model. you can try these How can I solve my problem? I’d really appreciate any help! Thanks!

NVIDIA-AI-IOT / torch2trt

loading and doing inference float32 in c++ API are working but loading the serialized int8/float16 model in c++ API is not working #615