TensorRT-int8 model is NOT faster than TensorRT-float16 model

I trained yolox-m model and convert it to TensorRT-int8 model with demo/trt.py. Difference from original trt.py is torch2trt's args , float16_mode=false and int8_mode=True, and calibration data [data].

 model_trt = torch2trt(
        model,
        [data],
        float16_mode=False,
        int8_mode=True,
        log_level=trt.Logger.INFO,
        max_workspace_size=(1 << args.workspace),
        max_batch_size=32,
        int8_calib_batch_size=32
    )

Infer times in tools/demo.py are like bellow.( Aws EC2 p3.2xlarge with Nvidia Tesla V100)

TensorRT-float16: Infer time: 0.0059s
TensorRT-int8: Infer time: 0.0072s

Other models( yolox-s, yolox-l) seems to be the same.
When using YOLOX, Int8 model is not faster than float model? Or I make some mistakes?

Megvii-BaseDetection / YOLOX

TensorRT-int8 model is NOT faster than TensorRT-float16 model #1001