NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Apache License 2.0
10.79k stars 2.13k forks source link

It is possible to modify only the weights? Reopen #1468

Closed doramasma closed 3 years ago

doramasma commented 3 years ago


Currently, I am using TensorRT to perform multiple inferences using the same model but with different weights. So, for now, I need to load the model multiple times (to use the other weights). So, my question is:

Is it possible to modify only the weights, since, the model is the same? pd: I have not found relevant information on this subject

Thank you in advance

pranavm-nvidia commented 3 years ago

Yes, you can use the refit API for that.

doramasma commented 3 years ago

Thanks for replying so fast. I will look in deep the documentations, but at first sight, it looks like it's what I was looking for. Thanks again!


doramasma commented 3 years ago

Greetings, I am reopening the post due to a problem that I found when I was using the refit API in python.


Based on what I have been looking at, it seems that the problem is with the installed version of Cuda.

Currently, I am using docker containers, in particular, the following (where I installed tensorRT with pip): FROM nvcr.io/partners/gridai/pytorch-lightning:v1.3.7

And my Cuda version is the following: image

pranavm-nvidia commented 3 years ago

Are you able to run anything else with TensorRT or is it just the refitter API that crashes? Which version are you using?

doramasma commented 3 years ago

Yes, I can perform the inference on my model (I convert my onnx model to a TensorRT model ".trt" with the trtexec tool). However, I want to modify the weights of this model, due to, I need to realize multiple inferences in a row, but with different weights.

So, I modified my script to add the refitter API and that cause the crashes (in that specific line). I'm using the following version: tensorrt

Sorry for the inconvenience and thank you for your quick response.

pranavm-nvidia commented 3 years ago

Did you build your engine with trt.BuilderFlag.REFIT? Could you also try enabling verbose logging (change the severity in your logger to trt.Logger.VERBOSE) and posting the output?

doramasma commented 3 years ago

I build the engine as follows:

engine = get_engine("mode.onnx", "model.trt")

refitter = trt.Refitter(engine, trt.Logger())

Where get_engine is the following function (it is based on this example https://github.com/NVIDIA/TensorRT/blob/master/samples/python/engine_refit_onnx_bidaf/build_and_refit_engine.py):

def get_engine(onnx_file_path, engine_file_path):
    """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
    def build_engine():
        """Takes an ONNX file and creates a TensorRT engine to run inference with"""
        builder = trt.Builder(TRT_LOGGER)
        network = builder.create_network(common.EXPLICIT_BATCH)
        parser = trt.OnnxParser(network, TRT_LOGGER)
        runtime = trt.Runtime(TRT_LOGGER)

        # Parse model file
        print('Loading ONNX file from path {}...'.format(onnx_file_path))
        with open(onnx_file_path, 'rb') as model:
            print('Beginning ONNX file parsing')
            if not parser.parse(model.read()):
                print('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                return None
        print('Completed parsing of ONNX file')

        # Print input info
        print('Network inputs:')
        for i in range(network.num_inputs):
            tensor = network.get_input(i)
            print(tensor.name, trt.nptype(tensor.dtype), tensor.shape)

        network.get_input(0).shape = [10, 1]
        network.get_input(1).shape = [10, 1, 1, 16]
        network.get_input(2).shape = [6, 1]
        network.get_input(3).shape = [6, 1, 1, 16]

        config = builder.create_builder_config()
        config.max_workspace_size = 1 << 28  # 256MiB

        print('Building an engine from file {}; this may take a while...'.format(
        plan = builder.build_serialized_network(network, config)
        engine = runtime.deserialize_cuda_engine(plan)
        print("Completed creating Engine")

        with open(engine_file_path, "wb") as f:
        return engine

    if os.path.exists(engine_file_path):
        # If a serialized engine exists, use it instead of building an engine.
        print("Reading engine from file {}".format(engine_file_path))
        with open(engine_file_path, "rb") as f:
            runtime = trt.Runtime(TRT_LOGGER)
            return runtime.deserialize_cuda_engine(f.read())
        return build_engine()

On the other hand, I tried to use trt.Logger.VERBOSE like this TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) But the output was the same

pranavm-nvidia commented 3 years ago

That looks right to me. Would it be possible for you to try this with a newer version of TensorRT?

doramasma commented 3 years ago

Should I use "pip install nvidia-tensorrt" to get the latest version?

pranavm-nvidia commented 3 years ago

Yeah, that should work. Alternatively you can try the latest TensorRT container.

doramasma commented 3 years ago

I tried in the last TensorRT container and I got another error:


Seems like i should change my implementation, i'm not sure

pranavm-nvidia commented 3 years ago

You'll need to rebuild the engine. Looks like you can just delete/rename/move it and your code should rebuild it. The reason is that engines are not compatible across different versions of TRT.

doramasma commented 3 years ago

I rebuild my model, and now I was able to create the engine. However, we back to the starting point


Maybe the following error tells you something? [TensorRT] INTERNAL ERROR: [refit.cpp::createInferRefitter_INTERNAL::1807] Error Code 3: Internal Error (Parameter check failed at: optimizer/std/refit.cpp::createInferRefitter_INTERNAL::1807, condition: e.isRefittable() )

pranavm-nvidia commented 3 years ago

Yeah, looks like somehow the engine is not being marked refittable despite the config flag. As a sanity check, could you see what the value of engine.refittable is?

doramasma commented 3 years ago

You are right, I notice that in the example I followed, they just activate the flag when u build the model using the onnx format. Instead, if you use trt files, they don't activate the flag. I will modify the code to try to introduce the flag using deserialize_cuda_engine, I hope not to have more problems.

I really appreciate the help you have given me and the speed of your replies. I hope I won't bother you anymore!

pranavm-nvidia commented 3 years ago

@doramasma You won't be able to change the engine to be refittable after it's already built. If you're using trtexec to build, you can add the --refit flag.

doramasma commented 3 years ago

Currently, I was able to create the refittable engine. So, I will try to modify the weights of my model in the following days. In principle, I hope not to encounter any further problems.

Thank you very much for your support!