It is possible to modify only the weights? Reopen

doramasma commented 3 years ago

Description

Currently, I am using TensorRT to perform multiple inferences using the same model but with different weights. So, for now, I need to load the model multiple times (to use the other weights). So, my question is:

Is it possible to modify only the weights, since, the model is the same? pd: I have not found relevant information on this subject

Thank you in advance

pranavm-nvidia commented 3 years ago

Yes, you can use the refit API for that.

doramasma commented 3 years ago

Thanks for replying so fast. I will look in deep the documentations, but at first sight, it looks like it's what I was looking for. Thanks again!

Doramas

doramasma commented 3 years ago

Greetings, I am reopening the post due to a problem that I found when I was using the refit API in python.

Based on what I have been looking at, it seems that the problem is with the installed version of Cuda.

Currently, I am using docker containers, in particular, the following (where I installed tensorRT with pip): FROM nvcr.io/partners/gridai/pytorch-lightning:v1.3.7

And my Cuda version is the following:

pranavm-nvidia commented 3 years ago

Are you able to run anything else with TensorRT or is it just the refitter API that crashes? Which version are you using?

doramasma commented 3 years ago

Yes, I can perform the inference on my model (I convert my onnx model to a TensorRT model ".trt" with the trtexec tool). However, I want to modify the weights of this model, due to, I need to realize multiple inferences in a row, but with different weights.

So, I modified my script to add the refitter API and that cause the crashes (in that specific line). I'm using the following version: tensorrt 7.2.3.4

Sorry for the inconvenience and thank you for your quick response.

pranavm-nvidia commented 3 years ago

Did you build your engine with trt.BuilderFlag.REFIT? Could you also try enabling verbose logging (change the severity in your logger to trt.Logger.VERBOSE) and posting the output?

doramasma commented 3 years ago

I build the engine as follows:

engine = get_engine("mode.onnx", "model.trt")

refitter = trt.Refitter(engine, trt.Logger())

Where get_engine is the following function (it is based on this example https://github.com/NVIDIA/TensorRT/blob/master/samples/python/engine_refit_onnx_bidaf/build_and_refit_engine.py):

def get_engine(onnx_file_path, engine_file_path):
    """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
    def build_engine():
        """Takes an ONNX file and creates a TensorRT engine to run inference with"""
        builder = trt.Builder(TRT_LOGGER)
        network = builder.create_network(common.EXPLICIT_BATCH)
        parser = trt.OnnxParser(network, TRT_LOGGER)
        runtime = trt.Runtime(TRT_LOGGER)

        # Parse model file
        print('Loading ONNX file from path {}...'.format(onnx_file_path))
        with open(onnx_file_path, 'rb') as model:
            print('Beginning ONNX file parsing')
            if not parser.parse(model.read()):
                print('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print(parser.get_error(error))
                return None
        print('Completed parsing of ONNX file')

        # Print input info
        print('Network inputs:')
        for i in range(network.num_inputs):
            tensor = network.get_input(i)
            print(tensor.name, trt.nptype(tensor.dtype), tensor.shape)

        network.get_input(0).shape = [10, 1]
        network.get_input(1).shape = [10, 1, 1, 16]
        network.get_input(2).shape = [6, 1]
        network.get_input(3).shape = [6, 1, 1, 16]

        config = builder.create_builder_config()
        config.set_flag(trt.BuilderFlag.REFIT)
        config.max_workspace_size = 1 << 28  # 256MiB

        print('Building an engine from file {}; this may take a while...'.format(
            onnx_file_path))
        plan = builder.build_serialized_network(network, config)
        engine = runtime.deserialize_cuda_engine(plan)
        print("Completed creating Engine")

        with open(engine_file_path, "wb") as f:
            f.write(plan)
        return engine

    if os.path.exists(engine_file_path):
        # If a serialized engine exists, use it instead of building an engine.
        print("Reading engine from file {}".format(engine_file_path))
        with open(engine_file_path, "rb") as f:
            runtime = trt.Runtime(TRT_LOGGER)
            return runtime.deserialize_cuda_engine(f.read())
    else:
        return build_engine()

On the other hand, I tried to use trt.Logger.VERBOSE like this TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) But the output was the same

pranavm-nvidia commented 3 years ago

That looks right to me. Would it be possible for you to try this with a newer version of TensorRT?

doramasma commented 3 years ago

Should I use "pip install nvidia-tensorrt" to get the latest version?

pranavm-nvidia commented 3 years ago

Yeah, that should work. Alternatively you can try the latest TensorRT container.

doramasma commented 3 years ago

I tried in the last TensorRT container and I got another error:

Seems like i should change my implementation, i'm not sure

pranavm-nvidia commented 3 years ago

You'll need to rebuild the engine. Looks like you can just delete/rename/move it and your code should rebuild it. The reason is that engines are not compatible across different versions of TRT.

doramasma commented 3 years ago

I rebuild my model, and now I was able to create the engine. However, we back to the starting point

Maybe the following error tells you something? [TensorRT] INTERNAL ERROR: [refit.cpp::createInferRefitter_INTERNAL::1807] Error Code 3: Internal Error (Parameter check failed at: optimizer/std/refit.cpp::createInferRefitter_INTERNAL::1807, condition: e.isRefittable() )

pranavm-nvidia commented 3 years ago

Yeah, looks like somehow the engine is not being marked refittable despite the config flag. As a sanity check, could you see what the value of engine.refittable is?

doramasma commented 3 years ago

You are right, I notice that in the example I followed, they just activate the flag when u build the model using the onnx format. Instead, if you use trt files, they don't activate the flag. I will modify the code to try to introduce the flag using deserialize_cuda_engine, I hope not to have more problems.

I really appreciate the help you have given me and the speed of your replies. I hope I won't bother you anymore!

pranavm-nvidia commented 3 years ago

@doramasma You won't be able to change the engine to be refittable after it's already built. If you're using trtexec to build, you can add the --refit flag.

doramasma commented 3 years ago

Currently, I was able to create the refittable engine. So, I will try to modify the weights of my model in the following days. In principle, I hope not to encounter any further problems.

Thank you very much for your support!

NVIDIA / TensorRT

It is possible to modify only the weights? Reopen #1468

Description