Open Nerdyvedi opened 3 years ago
Yes, I would like it, especially when it increases performance.
You can also have a look at https://github.com/allo-/virtual_webcam_background/issues/40#issuecomment-706603497 and https://github.com/de-code/python-tf-bodypix. I think they also appreciate help with the quantized models and I may migrate to use this library sometime (when I have the time to test it and see how easy it is to integrate).
With my python-tf-bodypix project, one could load the quant model like this:
python -m tf_bodypix \
draw-mask \
--source webcam:0 \
--show-output \
--threshold=0.75 \
--add-overlay-alpha=0.5 \
--colored \
--model-path=https://storage.googleapis.com/tfjs-models/savedmodel/bodypix/mobilenet/quant1/075/model-stride16.json
Although I am not seeing any speedup that way. It stilll seem to be referring to floats in the model.
I have now pushed my wip tflite support branch PR. It kind of stalled because I didn't have a suitable tflite model at hand.
Perhaps you could share your tflite model?
Hi, I have uploaded the resnet50 model with float 16 quantization. Please test it. I tested it on a CPU, and inference time decreased by around 25% Model
Thank you. I changed my branch to make it work with that model.
This could be tested via:
python -m tf_bodypix \
draw-mask \
--source webcam:0 \
--show-output \
--threshold=0.75 \
--add-overlay-alpha=0.5 \
--colored \
--output-stride=16 \
--model-path=/path/to/resnet_float_16.tflite
vs.
python -m tf_bodypix \
draw-mask \
--source webcam:0 \
--show-output \
--threshold=0.75 \
--add-overlay-alpha=0.5 \
--colored \
--output-stride=16 \
--model-path=https://storage.googleapis.com/tfjs-models/savedmodel/bodypix/resnet50/float/model-stride16.json
In my brief non-scientific test it seemed to be slower on my CPU. It might very well be my tflite
implementation (it's a bit hacked together). There are probably a few optimisations, like not calling get_tensor
for every output tensor.
The above commands will print out timings every second. For the model part it seems to be hovering around 430ms (but I have also seen it below for a few iterations). Whereas loading the tensorflow js model appears to be around 250ms. (That is then also reflected in the overall fps)
Would be interested to see what timings you are getting or what tflite integration you are using.
I need to take some time to test your library and tool with working CUDA. I am not sure, but maybe quantized models are optimized for less accurate but faster GPU pipelines?
I need to take some time to test your library and tool with working CUDA.
Yes, please do and let me know if there are any issues. There are no developer information at the moment but should be fairly straightforward (with make targets).
I am not sure, but maybe quantized models are optimized for less accurate GPU pipelines?
That's quite possible. Although @Nerdyvedi mentioned having tested it on a CPU as well. So it's probably something to do with the tflite integration as well.
I used the following code to test tflite
Initialising interpreter
interpreter = tf.lite.Interpreter(model_path="resnet_float_16.tflite") interpreter.allocate_tensors()
Then, inside the loop , getting the prediction
interpreter.set_tensor(input_details[0]['index'], sample_image). interpreter.invoke() segment_logits = interpreter.get_tensor(211)
I just used it to get the mask
I had one slight performance issue in that I was resizing the tensors. Getting just the float_segments
vs getting all tensors doesn't seem make a noticable difference. I am still getting much higher timings.
Are you not resizing the input tensor as well? By default it seems to have the resolution of 769x433 (width x height). Or what internal resolution are you using? (I am using 417x241).
What TensorFlow version are you using? (I am using 2.3.1
)
Hi, For some unknown reason the quantised weights do not work. But I was able to find a workaround. First converting the weights from json to SavedModel format, Then converting it to tflite and applying post training quantization, I was able to get faster results from the quantised model.
Let me know , If I should open a Pull request for this