Closed wyp19960713 closed 4 years ago
I think I am having the same problem converting a simple MNIST CNN model. Here is my code for reproducing (You need to install tensorflow
and tf2onnx
to run this):
from multiprocessing import Pool
import numpy
import tensorflow as tf
from tensorrt import Builder, BuilderFlag, IInt8EntropyCalibrator2, Logger, NetworkDefinitionCreationFlag, OnnxParser
(_TRAIN_IMAGES, _TRAIN_LABELS), _ = tf.keras.datasets.mnist.load_data()
_TRAIN_IMAGES = numpy.expand_dims(a=_TRAIN_IMAGES, axis=-1).astype(numpy.float32)
_TRAIN_LABELS = tf.keras.utils.to_categorical(y=_TRAIN_LABELS, num_classes=10)
class _Calibrator(IInt8EntropyCalibrator2):
def __init__(self):
super().__init__()
self._batch_size = 1
self._cache = None
def get_batch(self, names, p_str=None):
raise NotImplementedError
def get_batch_size(self):
return self._batch_size
def read_calibration_cache(self):
return self._cache
def write_calibration_cache(self, cache):
self._cache = cache
def _assert(value):
if not value:
raise AssertionError
def _get_frozen_graph_model():
with tf.compat.v1.Session(graph=tf.Graph()) as session:
model = tf.keras.Sequential(layers=[
tf.keras.layers.Conv2D(filters=32, kernel_size=[3, 3], activation='relu', input_shape=[28, 28, 1]),
tf.keras.layers.Conv2D(filters=64, kernel_size=[3, 3], activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=[2, 2]),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=128, activation='relu'),
tf.keras.layers.Dense(units=10, activation='softmax'),
])
model.compile(optimizer=tf.keras.optimizers.SGD(), loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x=_TRAIN_IMAGES, y=_TRAIN_LABELS)
return (tf.compat.v1.graph_util.convert_variables_to_constants(sess=session,
input_graph_def=session.graph_def,
output_node_names=[model.output.op.name]),
model.input.name,
model.output.name)
def _tf_to_onnx(graph_def, input_name, output_name):
from onnx import defs
from tf2onnx import tfonnx
with tf.Graph().as_default() as graph:
tf.import_graph_def(graph_def=graph_def, name='')
onnx_model = tfonnx.process_tf_graph(tf_graph=graph,
opset=defs.onnx_opset_version(),
input_names=[input_name],
output_names=[output_name])
return onnx_model.make_model('').SerializeToString()
def _onnx_to_tensorrt(onnx_model):
batch_size = 1
with Logger() as logger, \
Builder(logger) as builder, \
builder.create_network(1 << int(NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) as network, \
OnnxParser(network, logger) as onnx_parser:
_assert(onnx_parser.parse(onnx_model))
builder.max_batch_size = batch_size
builder_config = builder.create_builder_config()
optimization_profile = builder.create_optimization_profile()
for i in range(network.num_inputs):
input_tensor = network.get_input(i)
shape = (batch_size,) + input_tensor.shape[1:]
optimization_profile.set_shape(input=input_tensor.name, min=shape, opt=shape, max=shape)
builder_config.add_optimization_profile(optimization_profile)
builder_config.set_flag(BuilderFlag.INT8)
builder_config.int8_calibrator = _Calibrator()
cuda_engine = builder.build_engine(network, builder_config)
_assert(cuda_engine)
return cuda_engine
def _get_onnx_model():
graph_def, input_name, output_name = _get_frozen_graph_model()
return _tf_to_onnx(graph_def=graph_def, input_name=input_name, output_name=output_name)
def main():
with Pool(processes=1) as pool:
# Run in another process to make sure GPU memory used by TensorFlow gets freed.
onnx_model = pool.apply(_get_onnx_model)
_onnx_to_tensorrt(onnx_model=onnx_model)
if __name__ == '__main__':
main()
Hi @wyp19960713 @EFanZh ,
TensorRT 7.0 had known issues with INT8 calibration on models with dynamic shape. Please upgrade to TensorRT 7.1, the issue should be fixed.
@rmccorm4
Actually, I am using TensorRT 7.1.3.4, the problem still exists.
@EFanZh ~if your model has dynamic shape (-1/None in any dimension), have you defined an optimization profile?~
~I believe it's required for the INT8 calibration, and it will use the kOPT shape for calibration: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#int8-calib-dynamic-shapes~
Edit: sorry I see the optimization profile in your code snippet now. I'll try to take a look tomorrow.
Hello, @EFanZh, I have aksed the same issue to NVIDIA ,he suggested me to increase the WORKSPACE_SIZE. But,when I haved used the GPU of RTX 2080 that maximum of GPU memory is 10989MB and I have setted the WORKSPACE_SIZE to the max value,another error occured,as following: [TensorRT] VERBOSE: Calculating Maxima [TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 11 (invalid argument) [TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 11 (invalid argument) Is the reason for this error because the WORKSPACE_SIZE isn’t big enough? Can you help me to solve the problem?I will thank you very much! The following is my complete code,I don’t know whether it is wrong:
class MNISTEntropyCalibrator(trt.IInt8EntropyCalibrator):
def __init__(self, cache_file, batch_size=1):
# Whenever you specify a custom constructor for a TensorRT class,
# you MUST call the constructor of the parent explicitly.
trt.IInt8EntropyCalibrator.__init__(self)
self.cache_file = cache_file
# Every time get_batch is called, the next batch of size batch_size will be copied to the device and returned.
self.data = load_data(data_list)
self.batch_size = batch_size
self.current_index = 0
# Allocate enough memory for a whole batch.
print(self.data[0].nbytes * self.batch_size)
self.device_input = cuda.mem_alloc(self.data[0].nbytes * self.batch_size)
# self.device_input = cuda.mem_alloc(2 << 30)
print(self.device_input)
# TensorRT passes along the names of the engine bindings to the get_batch function.
# You don't necessarily have to use them, but they can be useful to understand the order of
# the inputs. The bindings list is expected to have the same ordering as 'names'.
def get_batch(self, names):
if self.current_index + self.batch_size > self.data.shape[0]:
return None
current_batch = int(self.current_index / self.batch_size)
if current_batch % 10 == 0:
print("Calibrating batch {:}, containing {:} images".format(current_batch, self.batch_size))
batch = self.data[self.current_index:self.current_index + self.batch_size].ravel()
cuda.memcpy_htod(self.device_input, batch)
self.current_index += self.batch_size
return [self.device_input]
def get_batch_size(self):
return self.batch_size
def read_calibration_cache(self):
# If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
if os.path.exists(self.cache_file):
with open(self.cache_file, "rb") as f:
return f.read()
def write_calibration_cache(self, cache):
with open(self.cache_file, "wb") as f:
f.write(cache)
EXPLICIT_BATCH = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
# Building engine
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as config, \
trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_batch_size = 1
builder.max_workspace_size = 1 << 33
builder.int8_mode = True
calibration_cache = "./mnist_calibration.cache"
calib = MNISTEntropyCalibrator(cache_file=calibration_cache, batch_size=1)
config_flags = 1 << int(trt.BuilderFlag.INT8)
config.flags = config_flags
config.int8_calibrator = calib
with open("/home/dm/ATP-Audio-classification-training-pipeline/voice_recognition/checkpoints/mobilenetV2-gvlad28/mobilenetV2.onnx", 'rb') as model:
if not parser.parse(model.read()):
for error in range(parser.num_errors):
print(parser.get_error(error))
last_layer = network.get_layer(network.num_layers - 1)
if not last_layer.get_output(0):
network.mark_output(last_layer.get_output(0))
print("network layers", network.num_layers)
inputs = [network.get_input(i) for i in range(network.num_inputs)]
outputs = [network.get_output(i) for i in range(network.num_outputs)]
for inp in inputs:
print(inp.shape[0])
for oup in outputs:
print(oup.shape[0])
profile_intput = builder.create_optimization_profile()
profile_intput.set_shape("input", (1, 257, 200, 1), (1, 257, 200, 1), (1, 257, 200, 1))
config.add_optimization_profile(profile_intput)
config.max_workspace_size = 1 << 33
engine = builder.build_engine(network, config)
with open("/home/dm/ATP-Audio-classification-training-pipeline/voice_recognition/checkpoints/mobilenetV2-gvlad28/mobilenetV2_int8.trt", "wb") as f:
f.write(engine.serialize())
I don’t think this is because of workspace size. My model is a very simple network, it should’t cost must GPU memory. And yes, I have tried setting max_workspace_size
and the problem has not gone away.
I meet the same issue... when i run int8 onnx model (resnet50 and mobilenet), it will print:
[08/14/2020-05:01:56] [V] [TRT] Engine generation completed in 4.22749 seconds. [08/14/2020-05:01:56] [V] [TRT] Calculating Maxima [08/14/2020-05:01:56] [E] [TRT] ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) [08/14/2020-05:01:56] [E] [TRT] ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
Hi @rmccorm4, Is there any updates on this one?
Hi @EFanZh ,
I'm not sure what errors you're experiencing, but I had to edit your code a lot.
get_batch
needs to be implementedRuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details)
, I believe also because of get_batch
not being implemented correctly.I made a sample Calibrator class from yours here and building the engine from your models fine for me:
# Calibrator.py
import os
import logging
import numpy as np
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
logging.basicConfig(level=logging.DEBUG,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
datefmt="%Y-%m-%d %H:%M:%S")
logger = logging.getLogger(__name__)
class _Calibrator(trt.IInt8EntropyCalibrator2):
def __init__(self, opt_shape=(1,28,28,1)):
super().__init__()
self._batch_size = opt_shape[0]
num_samples = 1000
self.batches = (np.random.random(opt_shape[1:]).astype(np.float32) for i in range(num_samples))
self.device_input = cuda.mem_alloc(np.zeros(opt_shape, dtype=np.float32).nbytes)
self.cache_file = "calibration.cache"
def get_batch(self, names, p_str=None):
try:
batch = next(self.batches)
cuda.memcpy_htod(self.device_input, batch)
return [int(self.device_input)]
except StopIteration:
return None
def get_batch_size(self):
return self._batch_size
def read_calibration_cache(self):
# If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
if os.path.exists(self.cache_file):
with open(self.cache_file, "rb") as f:
logger.info("Using calibration cache to save time: {:}".format(self.cache_file))
return f.read()
def write_calibration_cache(self, cache):
with open(self.cache_file, "wb") as f:
logger.info("Caching calibration data for future use: {:}".format(self.cache_file))
f.write(cache)
And was able to do int8 calibration using the above calibrator class on your model with dynamic shape:
root@35e49d2833e6:/mnt/tensorrt-utils/int8/calibration# python3 onnx_to_tensorrt.py --explicit-batch --onnx=../../../tf_gpu.onnx --int8
2020-09-04 04:00:14 - __main__ - INFO - TRT_LOGGER Verbosity: Severity.ERROR
2020-09-04 04:00:27 - __main__ - INFO - Setting BuilderFlag.INT8
2020-09-04 04:00:27 - __main__ - DEBUG - === Network Description ===
2020-09-04 04:00:27 - __main__ - DEBUG - Input 0 | Name: conv2d_input:0 | Shape: (-1, 28, 28, 1)
2020-09-04 04:00:27 - __main__ - DEBUG - Output 0 | Name: dense_1/Softmax:0 | Shape: (-1, -1)
2020-09-04 04:00:27 - __main__ - DEBUG - === Optimization Profiles ===
2020-09-04 04:00:27 - __main__ - DEBUG - conv2d_input:0 - OptProfile 0 - Min (1, 28, 28, 1) Opt (1, 28, 28, 1) Max (1, 28, 28, 1)
2020-09-04 04:00:27 - __main__ - DEBUG - conv2d_input:0 - OptProfile 1 - Min (8, 28, 28, 1) Opt (8, 28, 28, 1) Max (8, 28, 28, 1)
2020-09-04 04:00:27 - __main__ - DEBUG - conv2d_input:0 - OptProfile 2 - Min (16, 28, 28, 1) Opt (16, 28, 28, 1) Max (16, 28, 28, 1)
2020-09-04 04:00:27 - __main__ - DEBUG - conv2d_input:0 - OptProfile 3 - Min (32, 28, 28, 1) Opt (32, 28, 28, 1) Max (32, 28, 28, 1)
2020-09-04 04:00:27 - __main__ - DEBUG - conv2d_input:0 - OptProfile 4 - Min (64, 28, 28, 1) Opt (64, 28, 28, 1) Max (64, 28, 28, 1)
2020-09-04 04:00:27 - __main__ - INFO - Building Engine...
2020-09-04 04:00:32 - Calibrator - INFO - Caching calibration data for future use: calibration.cache
2020-09-04 04:00:38 - __main__ - INFO - Serializing engine to file: model.engine
The above script is from here, and I edited the calibrator used to be the class above instead of ImagenetCalibrator
:
# onnx_to_tensorrt.py
# ...
if args.int8:
from Calibrator import _Calibrator
config.int8_calibrator = _Calibrator()
Hopefully this helps as a reference for your issue as well @wyp19960713
@rmccorm4 Thank you for your response, I have run my script again, and the out of memory error has gone. Maybe some other program occupied my GPU memory at that time that cause that error.
No problem. I'm going to close this issue for now. Feel free to open a new issue if the solutions above don't work for you with the latest TensorRT version.
Description
[TensorRT] VERBOSE: Engine generation completed in 3.27319 seconds. [TensorRT] VERBOSE: Calculating Maxima [TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) [TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) When the calibrator is activated,these errors occur. The following is my code of calibrator:
Environment
TensorRT Version: 7.0.0.11 GPU Type: RTX 2070 Nvidia Driver Version: 440.82 CUDA Version: 10.0 CUDNN Version: 7.6.4 Operating System + Version:Ubuntu16.04 Python Version (if applicable): 3.7.6 TensorFlow Version (if applicable): 1.14 PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Relevant Files
Steps To Reproduce