Quantized model returns different results compared to float32 model

I am using faceboxes.param and faceboxes.bin for my face recognition model, and I want to quantize them.

To do this,

ncnnoptimize faceboxes.param faceboxes.bin faceboxes-opt.param faceboxes-opt.bin 0
ncnn2table faceboxes-opt.param faceboxes-opt.bin imagelist-widerface.txt faceboxes.table mean=[104,117,123] norm=0 shape=[300,300,3] pixel=BGR thread=8 method=kl -> my model takes 300,300,3 for inputs. mean and norm were double-checked many times.
ncnn2int8 faceboxes-opt.param faceboxes-opt.bin faceboxes-int8.param faceboxes-int8.bin faceboxes.table
Got faceboxes-int8.param and faceboxes-int8.bin

When I inference using the int8 model, it gave me different result with float32 one. ex) float32 model detects a person but int8 one didn't for a same image.

Also I turned on "net.opt.use_int8_inference = true" when running int8 model.

Is there anything I am missing?

Also I did the same thing with mnet.25-opt.bin (downloaded from link) and got mnet.25-int.param and bin. However this one is slower than float32, while size of the model is around 1/2. Why quantized model is slower than float32 model?

Tencent / ncnn

Quantized model returns different results compared to float32 model #3610