Closed doramasma closed 3 years ago
Yes, you can use the refit API for that.
Thanks for replying so fast. I will look in deep the documentations, but at first sight, it looks like it's what I was looking for. Thanks again!
Doramas
Greetings, I am reopening the post due to a problem that I found when I was using the refit API in python.
Based on what I have been looking at, it seems that the problem is with the installed version of Cuda.
Currently, I am using docker containers, in particular, the following (where I installed tensorRT with pip): FROM nvcr.io/partners/gridai/pytorch-lightning:v1.3.7
And my Cuda version is the following:
Are you able to run anything else with TensorRT or is it just the refitter API that crashes? Which version are you using?
Yes, I can perform the inference on my model (I convert my onnx model to a TensorRT model ".trt" with the trtexec tool). However, I want to modify the weights of this model, due to, I need to realize multiple inferences in a row, but with different weights.
So, I modified my script to add the refitter API and that cause the crashes (in that specific line). I'm using the following version: tensorrt 7.2.3.4
Sorry for the inconvenience and thank you for your quick response.
Did you build your engine with trt.BuilderFlag.REFIT
? Could you also try enabling verbose logging (change the severity in your logger to trt.Logger.VERBOSE
) and posting the output?
I build the engine as follows:
engine = get_engine("mode.onnx", "model.trt")
refitter = trt.Refitter(engine, trt.Logger())
Where get_engine is the following function (it is based on this example https://github.com/NVIDIA/TensorRT/blob/master/samples/python/engine_refit_onnx_bidaf/build_and_refit_engine.py):
def get_engine(onnx_file_path, engine_file_path):
"""Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
def build_engine():
"""Takes an ONNX file and creates a TensorRT engine to run inference with"""
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(common.EXPLICIT_BATCH)
parser = trt.OnnxParser(network, TRT_LOGGER)
runtime = trt.Runtime(TRT_LOGGER)
# Parse model file
print('Loading ONNX file from path {}...'.format(onnx_file_path))
with open(onnx_file_path, 'rb') as model:
print('Beginning ONNX file parsing')
if not parser.parse(model.read()):
print('ERROR: Failed to parse the ONNX file.')
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
print('Completed parsing of ONNX file')
# Print input info
print('Network inputs:')
for i in range(network.num_inputs):
tensor = network.get_input(i)
print(tensor.name, trt.nptype(tensor.dtype), tensor.shape)
network.get_input(0).shape = [10, 1]
network.get_input(1).shape = [10, 1, 1, 16]
network.get_input(2).shape = [6, 1]
network.get_input(3).shape = [6, 1, 1, 16]
config = builder.create_builder_config()
config.set_flag(trt.BuilderFlag.REFIT)
config.max_workspace_size = 1 << 28 # 256MiB
print('Building an engine from file {}; this may take a while...'.format(
onnx_file_path))
plan = builder.build_serialized_network(network, config)
engine = runtime.deserialize_cuda_engine(plan)
print("Completed creating Engine")
with open(engine_file_path, "wb") as f:
f.write(plan)
return engine
if os.path.exists(engine_file_path):
# If a serialized engine exists, use it instead of building an engine.
print("Reading engine from file {}".format(engine_file_path))
with open(engine_file_path, "rb") as f:
runtime = trt.Runtime(TRT_LOGGER)
return runtime.deserialize_cuda_engine(f.read())
else:
return build_engine()
On the other hand, I tried to use trt.Logger.VERBOSE like this TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
But the output was the same
That looks right to me. Would it be possible for you to try this with a newer version of TensorRT?
Should I use "pip install nvidia-tensorrt" to get the latest version?
Yeah, that should work. Alternatively you can try the latest TensorRT container.
I tried in the last TensorRT container and I got another error:
Seems like i should change my implementation, i'm not sure
You'll need to rebuild the engine. Looks like you can just delete/rename/move it and your code should rebuild it. The reason is that engines are not compatible across different versions of TRT.
I rebuild my model, and now I was able to create the engine. However, we back to the starting point
Maybe the following error tells you something? [TensorRT] INTERNAL ERROR: [refit.cpp::createInferRefitter_INTERNAL::1807] Error Code 3: Internal Error (Parameter check failed at: optimizer/std/refit.cpp::createInferRefitter_INTERNAL::1807, condition: e.isRefittable() )
Yeah, looks like somehow the engine is not being marked refittable despite the config flag. As a sanity check, could you see what the value of engine.refittable
is?
You are right, I notice that in the example I followed, they just activate the flag when u build the model using the onnx format. Instead, if you use trt files, they don't activate the flag. I will modify the code to try to introduce the flag using deserialize_cuda_engine, I hope not to have more problems.
I really appreciate the help you have given me and the speed of your replies. I hope I won't bother you anymore!
@doramasma You won't be able to change the engine to be refittable after it's already built. If you're using trtexec
to build, you can add the --refit
flag.
Currently, I was able to create the refittable engine. So, I will try to modify the weights of my model in the following days. In principle, I hope not to encounter any further problems.
Thank you very much for your support!
Description
Currently, I am using TensorRT to perform multiple inferences using the same model but with different weights. So, for now, I need to load the model multiple times (to use the other weights). So, my question is:
Is it possible to modify only the weights, since, the model is the same? pd: I have not found relevant information on this subject
Thank you in advance