Open varunnathan opened 1 year ago
@nvpohanh ^ ^
We follow the huggingface config: https://huggingface.co/t5-large/blob/main/config.json#L15 Maybe create a new model instead of modifying the exists one?
Thanks for your reply @zerollzeng. The model I am trying to convert is fine-tuned from https://huggingface.co/t5-large with a max_seq_length of 2048 which when used for inference works with 2048 as sequence length. However, model.config.n_positions = 512
Hello @varunnathan, so far we are not accepting customized models if they are not from HuggingFace. In our code, https://github.com/NVIDIA/TensorRT/blob/release/8.6/demo/HuggingFace/T5/trt.py#L133, we are using HuggingFace model config. The issue is here: https://huggingface.co/t5-large/blob/main/config.json#L7. d_model = 1024, so our TRT profile cannot be extended to 2048 unless you change that field. As @zerollzeng said, you may need to create a new model and use an updated HF config instead of using existing ones.
Thanks for your suggestion @nvluxiaoz . The issue is that when the HF model is fine-tuned, its config isn't updated. Let me see if I can make the conversion for this model work by updating the value of d_model key in its config.
Description
Followed https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb to convert a custom fine-tuned T5-large model into TensorRT engine.
Points to note about T5-large model: -> Fine-tuned with a max_sequence_length of 2048
Error obtained while running the conversion script with a max_sequence_length of 2048: "Error Code 4: Internal Error (encoder_hidden_states: for dimension number 2 in profile 0 does not match network definition (got min=2048, opt=2048, max=2048), expected min=opt=max=1024).)"
Points to note about the conversion script: -> Works as expected when I use a max_sequence_length of 1024
Environment
TensorRT Version: 8.6.0.12 NVIDIA GPU: T4-16GB NVIDIA Driver Version: 515.65.01 CUDA Version: 12.0 CUDNN Version: 8.08 Operating System: Ubuntu 20.04.5 LTS Python Version (if applicable): 3.8.10 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.10.2+cu113 Baremetal or Container (if so, version):
Relevant Files
Attached the script I am using for converting my custom fine-tuned t5-large model into TensorRT format. This is based off of https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb. t5_inference_with_tensorrt.py.zip
Steps To Reproduce
Error Message:
[E] 4: [network.cpp::validate::3084] Error Code 4: Internal Error (encoder_hidden_states: for dimension number 2 in profile 0 does not match network definition (got min=2048, opt=2048, max=2048), expected min=opt=max=1024).) [!] Invalid Engine. Please ensure the engine was built correctly
PolygraphyException Traceback (most recent call last) Cell In[157], line 1 ----> 1 trt_engine = engine_from_network(network_definition, config=trt_inference_config)
File:3, in engine_from_network(network, config, save_timing_cache)
File /usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py:42, in BaseLoader.call(self, *args, *kwargs) 36 """ 37 Invokes the loader by forwarding arguments to
call_impl
. 38 39 Note:call_impl
should not be called directly - use this function instead. 40 """ 41 doc = self.call_impl.doc ---> 42 return self.call_impl(args, **kwargs)File /usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py:530, in EngineFromNetwork.call_impl(self) 524 """ 525 Returns: 526 trt.ICudaEngine: The engine that was created. 527 """ 528 # We do not invoke super().call_impl here because we would otherwise be responsible 529 # for freeing it's return values. --> 530 return engine_from_bytes(super().call_impl)
File:3, in engine_from_bytes(serialized_engine)
File /usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py:42, in BaseLoader.call(self, *args, *kwargs) 36 """ 37 Invokes the loader by forwarding arguments to
call_impl
. 38 39 Note:call_impl
should not be called directly - use this function instead. 40 """ 41 doc = self.call_impl.doc ---> 42 return self.call_impl(args, **kwargs)File /usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py:554, in EngineFromBytes.call_impl(self) 549 def call_impl(self): 550 """ 551 Returns: 552 trt.ICudaEngine: The deserialized engine. 553 """ --> 554 buffer, owns_buffer = util.invoke_if_callable(self._serialized_engine) 556 trt.init_libnvinfer_plugins(trt_util.get_trt_logger(), "") 557 with contextlib.ExitStack() as stack, trt.Runtime(trt_util.get_trt_logger()) as runtime:
File /usr/local/lib/python3.8/dist-packages/polygraphy/util/util.py:661, in invoke_if_callable(func,*args, *kwargs) 656 """ 657 Attempts to invoke a function with arguments. If
func
is not callable, then returnsfunc
658 The second return value of this function indicates whether the argument was a callable. 659 """ 660 if callable(func): --> 661 ret = func(args, **kwargs) 662 return ret, True 663 return func, FalseFile /usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py:488, in EngineBytesFromNetwork.call_impl(self) 485 end_time = time.time() 487 if not engine_bytes: --> 488 G_LOGGER.critical("Invalid Engine. Please ensure the engine was built correctly") 490 G_LOGGER.finish(f"Finished engine building in {end_time - start_time:.3f} seconds") 492 if self.timing_cache_path:
File /usr/local/lib/python3.8/dist-packages/polygraphy/logger/logger.py:597, in Logger.critical(self, message) 594 self.log(message, Logger.CRITICAL, stack_depth=3) 595 from polygraphy.exception import PolygraphyException --> 597 raise PolygraphyException(message) from None
PolygraphyException: Invalid Engine. Please ensure the engine was built correctly
My understanding of the error : The network definition is created off of the onnx model and that has 1024 as the third dimension (-1, -1, 1024) whereas I specify 2048 as the sequence length in "profile creation" which is fed as input to the onnx->Trt engine creation step. What I don't understand is that why is onnx model conversion step not considering this fact? Am I required to change anything in the HF model's config?