Open changtimwu opened 4 years ago
bin/trtexec --avgRuns=100 --deploy=resnet50.prototxt --fp16 --batch=8 --iterations=10000 --output=prob --useSpinWait
bin/trtexec --avgRuns=100 --deploy=resnet50.prototxt --int8 --batch=8 --iterations=10000 --output=prob --useSpinWait
not sure if qps
is equivlent to fps
. The internal variable is latencyThroughtput
. q
means query
.
https://github.com/NVIDIA/TensorRT/blob/master/samples/common/sampleReporting.cpp#L153
worth to read the How Do I Measure Performance? section. The overall system performance can be measured by the latency and throughput of the entire processing pipeline. Because the pre and post-processing steps depend so strongly on the particular application, in this section, we will mostly consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
check tensorrt examples.
dpkg -L libnvinfer-samples
automl/efficientdet
pip install -U Cython 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
# Step 1: export model
!python model_inspect.py --runmode=saved_model \
--model_name=efficientdet-d0 --ckpt_path=efficientdet-d0 \
--saved_model_dir=/tmp/saved_model
root@55b6f8ad558e:~/workspace/automl/efficientdet# ls -ahR /tmp/saved_model/
/tmp/saved_model/:
. .. saved_model.pb variables
/tmp/saved_model/variables:
. .. variables.data-00000-of-00001 variables.index
let's evaluate fps in this way
official resnet50
yolov3