Closed cyrusbehr closed 5 years ago
I think this may have to do with auto tuning the model. Are you able to provide the script used for auto tuning the resnet 100 model?
Even after fine tuning the model using the following scrip, I still see no improvements in performance.
import os
import numpy as np
import nnvm.testing
import nnvm.compiler
import tvm
import mxnet as mx
from tvm import autotvm
import tvm.relay as relay
from tvm.autotvm.tuner import XGBTuner, GATuner, RandomTuner, GridSearchTuner
import tvm.contrib.graph_runtime as runtime
def get_network(name, batch_size):
prefix,epoch = "/home/models/faceDetection/model",0
sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)
opt_level = 3
shape_dict = {'data': (1, 3, 112, 112)}
nnvm_sym, nnvm_params = nnvm.frontend.from_mxnet(sym, arg_params, aux_params)
input_shape = (batch_size, 3, 112, 112)
output_shape = (batch_size, 512)
return nnvm_sym, nnvm_params, input_shape, output_shape
target = "llvm -mcpu=skylake"
batch_size = 1
dtype = "float32"
model_name = "resnet-18"
log_file = "%s.log" % model_name
num_threads = 1
os.environ["TVM_NUM_THREADS"] = str(num_threads)
tuning_option = {
'log_filename': log_file,
'tuner': 'random',
'early_stopping': None,
'measure_option': autotvm.measure_option(
builder=autotvm.LocalBuilder(),
runner=autotvm.LocalRunner(number=10, repeat=1,
min_repeat_ms=1000),
),
}
# You can skip the implementation of this function for this tutorial.
def tune_kernels(tasks,
measure_option,
tuner='gridsearch',
early_stopping=None,
log_filename='tuning.log'):
for i, tsk in enumerate(tasks):
prefix = "[Task %2d/%2d] " % (i+1, len(tasks))
# converting conv2d tasks to conv2d_NCHWc tasks
op_name = tsk.workload[0]
if op_name == 'conv2d':
func_create = 'topi_x86_conv2d_NCHWc'
elif op_name == 'depthwise_conv2d_nchw':
func_create = 'topi_x86_depthwise_conv2d_NCHWc_from_nchw'
else:
raise ValueError("Tuning {} is not supported on x86".format(op_name))
task = autotvm.task.create(func_create, args=tsk.args,
target=target, template_key='direct')
task.workload = tsk.workload
# create tuner
if tuner == 'xgb' or tuner == 'xgb-rank':
tuner_obj = XGBTuner(task, loss_type='rank')
elif tuner == 'ga':
tuner_obj = GATuner(task, pop_size=50)
elif tuner == 'random':
tuner_obj = RandomTuner(task)
elif tuner == 'gridsearch':
tuner_obj = GridSearchTuner(task)
else:
raise ValueError("Invalid tuner: " + tuner)
# do tuning
n_trial=50#len(task.config_space)
tuner_obj.tune(n_trial=n_trial,
early_stopping=early_stopping,
measure_option=measure_option,
callbacks=[
autotvm.callback.progress_bar(n_trial, prefix=prefix),
autotvm.callback.log_to_file(log_filename)])
########################################################################
# Finally, we launch tuning jobs and evaluate the end-to-end performance.
def tune_and_evaluate(tuning_opt):
# extract workloads from nnvm graph
print("Extract tasks...")
net, params, data_shape, out_shape = get_network(model_name, batch_size)
tasks = autotvm.task.extract_from_graph(net, target=target,
shape={'data': data_shape}, dtype=dtype,
symbols=(nnvm.sym.conv2d,))
# run tuning tasks
print("Tuning...")
tune_kernels(tasks, **tuning_opt)
# compile kernels with history best records
with autotvm.apply_history_best(log_file):
print("Compile...")
with nnvm.compiler.build_config(opt_level=3):
graph, lib, params = nnvm.compiler.build(
net, target=target, shape={'data': data_shape}, params=params, dtype=dtype)
# upload parameters to device
ctx = tvm.cpu()
data_tvm = tvm.nd.array((np.random.uniform(size=data_shape)).astype(dtype))
module = runtime.create(graph, lib, ctx)
module.set_input('data', data_tvm)
module.set_input(**params)
# evaluate
print("Evaluate inference time cost...")
ftimer = module.module.time_evaluator("run", ctx, number=100, repeat=3)
prof_res = np.array(ftimer().results) * 1000 # convert to millisecond
print("Mean inference time (std dev): %.2f ms (%.2f ms)" %
(np.mean(prof_res), np.std(prof_res)))
lib.export_library("./deploy_tuned_lib.so")
print('lib export succeefully')
with open("./deploy_tuned_graph.json", "w") as fo:
fo.write(graph.json())
with open("./deploy_tuned_param.params", "wb") as fo:
fo.write(nnvm.compiler.save_param_dict(params))
# We do not run the tuning in our webpage server since it takes too long.
# Uncomment the following line to run it by yourself.
tune_and_evaluate(tuning_option)
######################################################################
# Sample Output
# -------------
# The tuning needs to compile many programs and extract feature from them.
# So a high performance CPU is recommended.
# One sample output is listed below.
#
# .. code-block:: bash
#
# Extract tasks...
# Tuning...
# [Task 1/12] Current/Best: 598.05/2497.63 GFLOPS | Progress: (252/252) | 1357.95 s Done.
# [Task 2/12] Current/Best: 522.63/2279.24 GFLOPS | Progress: (784/784) | 3989.60 s Done.
# [Task 3/12] Current/Best: 447.33/1927.69 GFLOPS | Progress: (784/784) | 3869.14 s Done.
# [Task 4/12] Current/Best: 481.11/1912.34 GFLOPS | Progress: (672/672) | 3274.25 s Done.
# [Task 5/12] Current/Best: 414.09/1598.45 GFLOPS | Progress: (672/672) | 2720.78 s Done.
# [Task 6/12] Current/Best: 508.96/2273.20 GFLOPS | Progress: (768/768) | 3718.75 s Done.
# [Task 7/12] Current/Best: 469.14/1955.79 GFLOPS | Progress: (576/576) | 2665.67 s Done.
# [Task 8/12] Current/Best: 230.91/1658.97 GFLOPS | Progress: (576/576) | 2435.01 s Done.
# [Task 9/12] Current/Best: 487.75/2295.19 GFLOPS | Progress: (648/648) | 3009.95 s Done.
# [Task 10/12] Current/Best: 182.33/1734.45 GFLOPS | Progress: (360/360) | 1755.06 s Done.
# [Task 11/12] Current/Best: 372.18/1745.15 GFLOPS | Progress: (360/360) | 1684.50 s Done.
# [Task 12/12] Current/Best: 215.34/2271.11 GFLOPS | Progress: (400/400) | 2128.74 s Done.
# Compile...
# Evaluate inference time cost...
# Mean inference time (std dev): 3.16 ms (0.03 ms)
use the ga tuner
This might be wrong => model_name = "resnet-18"
@cyrusbehr how you solve the problem? please help me out
@dexception what do you mean this might be wrong? And why using the ga tuner?
@cyrusbehr I have tested and got same results. Have you found solution to this problem?
I am trying to convert an MXNET model to TVM in order to improve the inference speed. I am able to convert it successfully, however I do not experience the improvements in speed which are advertised on this page
I have followed the tutorial here, but I will go through the steps I took.
I first downloaded the Insightface model
LResNet100E-IR,ArcFace@ms1m-refine-v2
which can be found here. Note that I am using the same model from the TVM benchmark.Next, I use the following python script to convert the model to the TVM compatible models Note that when I run the command
llc --version
I get the following output (which is why I set the target toskylake
)Python conversion script
When I run the script, I get the following warning messages:
but it ultimately exports the models successfully.
Next, I import the the converted models and
deploy_lib.so
into my C++ project. I am using the following code. The majority of the code is taken from the example on this pageNote that the images I am provided are pre-aligned and cropped to 112x112.
On average, the inference takes 360ms, which is roughly the same time it takes to perform inference using MXNET (C++, MKLDNN). I was expecting to see a significant decrease in inference time.
I am not sure if the issue has to do with the warnings during the conversion? I followed the conversion tutorial exactly and the tutorial did not mention needing to fine tune the model or anything.
Here is the output of
cat /proc/cpuinfo
to understand what hardward I am running the benchmark on: