Qengineering / TensorFlow_Lite_SSD_RPi_64-bits

TensorFlow Lite SSD on bare Raspberry Pi 4 with 64-bit OS at 24 FPS
https://qengineering.eu/install-ubuntu-18.04-on-raspberry-pi-4.html
BSD 3-Clause "New" or "Revised" License
41 stars 6 forks source link

Customization of the Modell #8

Closed ningelschlingel closed 1 year ago

ningelschlingel commented 1 year ago

Hi

I am in desperate need of a fast Hand-Detection model. Although I wasn't able to reach the 24 fps (I am stuck at around 10 with overclocking, 7 without - loading the model-file in python), this is still the fastest detection I got to run on my Pi 4.

But its for the "wrong" objects - I need only the detection of hands.

While comparing your model with a pretrained PyTorch implementation of the ssd mobile net v3 I noticed that even the official models are too slow. Even on an 2023 MacBook I reached only 5 fps.

Could you point me in the somewhat right direction for how to customize the model for my own use case?

I was thinking of using the EgoHands dataset

Ff my request is considered inappropriate here, feel free to delete it without comment.

Best regards, Jan

Qengineering commented 1 year ago

Dear @ningelsohn ,

Please take a look at the repo https://github.com/Qengineering/Hand-Pose-ncnn-Raspberry-Pi-4. It consist of two parts. The first detect hands in a scene. The second part recognize the position of fingers. If you only use the first part, You have a fast detection of hands. At the time of writing, the second part doesn't work well with the latest ncnn framework. No bother to you. Please note, using Python does indeed slow down you performance as you have seen with tf-lite.

In the nanodet-hand.cpp, skip the lines 337 - 383, as being the detection of finger position. In fact all code related to nanopnt can be pruned.

ningelschlingel commented 1 year ago

Dear @Qengineering,

Thank you for the quick and helpful reply, I will look into it! Just out of curiosity: I thought that the detection of different every-day-objects and detection of hands are very similar tasks. Why do the projects differ so much? Would the model used in the current repo (TensorFlow_Lite_SSD_RPi_64-bits – when trained with appropriate data for hand detection – perform just like it does right now?

In the meantime, I have also found the underlying paper from you. If I understand correctly, the speed is mainly (or even exclusively) achieved by adjusting the width-multiplier and the input resolution. Is that correct?

Sorry for the possibly clumsy question, I have little experience with such scientific papers.

Thanks again! This is all very helpful.

Qengineering commented 1 year ago

Dear @ningelsohn,

You're right. After transfer training with the appropriate data, you could recognize hands with this network. Keep in mind, the used network in this repo is a quantize 8-bit version special tailored for TF-lite.

Speed depends on the input resolution and the number of layers. In case of MobileNet the input size is fixed. You can't change it without needing another network. Modern models like YoloV5 can handle dynamic sizes. The same model can works with inputs ranging from let say 320x320 to 1280x1280.

ningelschlingel commented 1 year ago

@Qengineering thank you for your help!

ningelschlingel commented 1 year ago

Sorry to open this again, but this is really the best lead I had so far and other approaches just don't seem to work as good.

This repo contains the quantized model, but apparently not the code it was created with. If so, is that intentional?

I have, among other things, to your repos and papers learned about how to optimize with quantization for example. But putting everything together is not trivial.

I will close this in a few days as you have already helped me more than I could have expected

Qengineering commented 1 year ago

Dear Jan,

What exactly do you want to achieve? I understand that you want to recognize hands from an image. But in what context?

For example, if you want to follow a shoplifter's hands from a surveillance camera. Or if you like to signal a "hands" of an FC Bayern München player, this will not work. At the very first stage, images are resized to the (small) format used by the deep learning model. By then, hands are reduced to a few pixels. In this situation, you will have to use modern models like YoloV5 with a large input resolution. Needless to say, your speed has completely evaporated as a result. 0.5 FPS will already be a lot on an RPi 4.

If you want to recognize the hands of a person waving directly in front of the camera, then you have at least a reasonable amount of pixels per hand to recognize. If you can also handle 10-12 FPS, then a NanoDet or YoloV5-tiny model is the solution. I'd be happy to open a repo for you on GitHub with the code. But remember, the hands will have to be fairly prominent in the picture and will not always be recognized (think of a fist). Fingers are also not recognized with this model.

Training a network yourself is indeed not trivial. You will have to determine what is feasible. MobileNet with Tensorflow Lite is easy to train, but only works fast if you are able to convert the floats of the weights to int8 numbers. Often this is accompanied by a significant loss of accuracy. In addition, as already mentioned, the input size of the images is also not large (320x320).

Another possibility is the Yolo road. However, you quickly get stuck here when it comes to transferring a PyTorch model to a model that is better supported by an RPi, such as the C++ ncnn framework. It is doable, but then you have to count on speeds of 5-10 FPS.

In both cases you will have to provide images yourself that can be used during the training. A time consuming and boring job.

ningelschlingel commented 1 year ago

Dear Qengineering,

in my use case the hand-detection will be used as a touches navigator/selector and will be very prominent in the frame. I am currently already looking at tiny yolo. As only flat hands are to be recognized, there is little to no variation in the appearance of the object which should get recognized.

Thanks again for your detailed response!

I will take a deeper look into NanoDet and Yolov5-Tiny. If it is not too much trouble, I would also like to have a look at the code mentioned.

PS: ultralytics (well known I guess) has stated that quantization is not really a big factor regarding speed: https://github.com/ultralytics/yolov5/issues/211

But with their smallest model I get "only" 1.5 fps