Closed Smarter-version closed 2 months ago
Modified the inputs and outputs of foreignnode in replay.json from DataType.HALF to DataType.FLOAT, as they have a significant impact on the precision.
You modify the inner node in/out type, it maybe destory the fusion tatic.
To make this process easier, Polygraphy provides two built-in algorithm selectors:
TacticRecorder
and TacticReplayer
. The former can be used to record tactics selected
during an engine build, and the latter to play them back during a subsequent build.
The CLI tools include --save-tactics
and --load-tactics
options correspnding to these.
Modified the inputs and outputs of foreignnode in replay.json from DataType.HALF to DataType.FLOAT, as they have a significant impact on the precision.
You modify the inner node in/out type, it maybe destory the fusion tatic.
To make this process easier, Polygraphy provides two built-in algorithm selectors:
TacticRecorder
andTacticReplayer
. The former can be used to record tactics selected during an engine build, and the latter to play them back during a subsequent build. The CLI tools include--save-tactics
and--load-tactics
options correspnding to these.
Thank you for your response. This means that there is no way I can use a customized replay.json file to generate the engine file right? Is there any other way to change the input/output type of the foreignnode node?
Is there any other way to change the input/output type of the foreignnode node?
find the node in-out, then use follow
trtexec --layerPrecisions=spec --layerOutputTypes=spec --layerDeviceTypes=spec
find the node in-out, then use follow
trtexec --layerPrecisions=spec --layerOutputTypes=spec --layerDeviceTypes=spec
I tried that, but the nodes are wrapped inside the foreignnode and it doesn't change the input/output type of one of the nodes
You can mark the node as out node by onnx api, then rebuild by trtexec.
I tried that too, but it gives me an error.
OOM, insufficient workspace, Can you upload the full log with trtexec --verbose
?
Here's my code.
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
TRT_LOGGER = trt.Logger()
def get_engine(onnx_file_path, engine_file_path=""):
"""Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
"""Takes an ONNX file and creates a TensorRT engine to run inference with"""
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH
) as network, builder.create_builder_config() as config, trt.OnnxParser(
network, TRT_LOGGER
) as parser, trt.Runtime(
TRT_LOGGER
) as runtime:
config.max_workspace_size = 1 << 32
config.set_flag(trt.BuilderFlag.FP16)
config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
builder.max_batch_size = 1
# Parse model file
print("Loading ONNX file from path {}...".format(onnx_file_path))
with open(onnx_file_path, "rb") as model:
print("Beginning ONNX file parsing")
if not parser.parse(model.read()):
print("ERROR: Failed to parse the ONNX file.")
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
print("Completed parsing of ONNX file")
print("Building an engine from file {}; this may take a while...".format(onnx_file_path))
plan = builder.build_serialized_network(network, config)
# for layer in network:
# print(f"Layer name: {layer.name}, Precision: {layer.precision}")
# # return 0
# 设置网络中特定层的精度
fp32_layer = ['Reshape_3209', '(Unnamed Layer* 1501) [Shuffle]', '(Unnamed Layer* 2290) [Constant]', '(Unnamed Layer* 2292) [Shuffle]', '(Unnamed Layer* 3701) [Constant]', '(Unnamed Layer* 3703) [Shuffle]', '(Unnamed Layer* 5112) [Constant]', '(Unnamed Layer* 5114) [Shuffle]',
'(Unnamed Layer* 6523) [Constant]', '(Unnamed Layer* 6525) [Shuffle]', '(Unnamed Layer* 7934) [Constant]', '(Unnamed Layer* 7936) [Shuffle]']
for i, layer in enumerate(network):
if layer.name in fp32_layer:
layer.precision = trt.float32
layer.set_output_type(0, trt.float32)
print(f"fp32 index: {i}; name: {layer.name}")
assert network[1501].precision == trt.float32
assert network[2176].precision == trt.float32
for layer in network:
print(f"Layer name: {layer.name}, Precision: {layer.precision}")
config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
plan = builder.build_serialized_network(network, config)
engine = runtime.deserialize_cuda_engine(plan)
print("Completed creating Engine")
with open(engine_file_path, "wb") as f:
f.write(plan)
return engine
This is the err log:
[08/01/2024-16:20:11] [TRT] [W] TensorRT was linked against cuDNN 8.6.0 but loaded cuDNN 8.1.1
[08/01/2024-16:23:41] [TRT] [W] Skipping tactic 0x0000000000000000 due to exception [verify_outputtype] Mismatched type for tensor (Unnamed Layer 2292) [Shuffle]_output', f32 vs. expected type:f16.
[08/01/2024-16:23:41] [TRT] [E] 4: [optimizer.cpp::computeCosts::3726] Error Code 4: Internal Error (Could not find any implementation for node {ForeignNode[Abs_507...Cast_3382]} due to insufficient workspace. See verbose log for requested sizes.)
[08/01/2024-16:23:41] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
Traceback (most recent call last):
File "onnx_to_engine_layer.py", line 86, in
Invoked with: <tensorrt.tensorrt.Runtime object at 0x7fbca3784d70>, None
$trt_bin_path/trtexec \
--onnx=$onnx_path \
--saveEngine=$engine_path \
--plugins=$plugins_path \
--verbose --workspace=2048 \
--exportProfile=${engine_path}.profile.json \
--exportLayerInfo=${engine_path}.graph.json \
--profilingVerbosity=detailed \
--fp16 \
--precisionConstraints=obey \
--layerPrecisions=Add_1546:fp32,Slice_1501:fp32 --layerOutputTypes=Add_1546:fp32,Slice_1501:fp32:
Like this, the internal type cannot be changed.
Try to increase or decrease the size config.max_workspace_size = 1 << 32
.
BTW, try to change config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
to
config.set_flag(trt.BuilderFlag.PEFER_PRECISION_CONSTRAINTS)
Try to increase or decrease the size
config.max_workspace_size = 1 << 32
.
I tried config.max_workspace_size = 1 << 28
,it doesn't work.
BTW, try to change
config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
toconfig.set_flag(trt.BuilderFlag.PEFER_PRECISION_CONSTRAINTS)
I tried, it doesn't work.
Or increase the size config.max_workspace_size = 1 << 36
Or increase the size
config.max_workspace_size = 1 << 36
Doesn't work.I'm now skeptical that there is really a way to change myelin(foreignnode(...)) internal node types?
Myelin compiler usually optimizes the sub-graph with specific parttern(e.g. multi-head attention in Transformer based network or some point-wise operations, like slice, gather). Myelin is a TRT component of which the behavior is not documented.
If you want to break the myelin graph-nodes optimize, you can split the model/onnx, or replace some ops with plugin. Also, restruct the forward code, advoid the myelin case.
Thanks for your reply, I will try. I have no more questions, issue will close.
The command I use is: polygraphy convert identity.onnx --fp16 --save-tactics replay.json -o 0.engine Modified the inputs and outputs of foreignnode in replay.json from DataType.HALF to DataType.FLOAT, as they have a significant impact on the precision. Then regenerate the engine model based on the modified replay.json: polygraphy convert identity.onnx --fp16 --load-tactics replay.json -o 1.engine But an error was reported: [ERROR] Exception caught in select_algorithms(): PolygraphyException: Layer: {ForeignNode[Abs_507.... .Cast_3382]} | Tactic in replay was not provided by TensorRT as a choice for this layer. Has the network or builder configuration changed since the replay file was generated?