ambianic / fall-detection

Python ML library for people fall detection
Apache License 2.0
89 stars 17 forks source link

Upgrade pose detection from PoseNet MobileNetV1 to MoveNet, PoseNet 2.0 ResNet50, or BlazePose #5

Open ivelin opened 3 years ago

ivelin commented 3 years ago

UPDATE: June 11, 2021

Is your feature request related to a problem? Please describe. Currently we use mobilnetv2 with a 300x300 input tensor (image) by default for object detection.

It struggles with poses of people laying down on the floor. We experimented with rotating images +/-90', which improves overall detection rates, but it still misses poses of fallen people, even when the full body is clearly visible by a human eye.

Clearly the model has not been trained on fallen people poses.

Describe the solution you'd like

Additional context

See TensorFlow 2 Detection Model Zoo https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md

Notice the high performance and dual purpose (object + keypoints) for CenterNet Resnet50 V1 FPN Keypoints 512x512 and CenterNet Resnet50 V2 Keypoints 512x512.

More on CenterNet and its various applications for object detection, posedetection and object motion tracking. https://github.com/xingyizhou/CenterNet

ivelin commented 3 years ago

The various JS models are easy to see in the demo console log:

https://storage.googleapis.com/tfjs-models/demos/posenet/camera.html

Screen Shot 2021-01-08 at 12 30 42 PM

For example a good resnet model for single-pose with balanced parameters for CPU (see screenshot for param details): https://storage.googleapis.com/tfjs-models/savedmodel/posenet/resnet50/quant2/group1-shard11of12.bin

ivelin commented 3 years ago

TF saved model checkpoints for PoseNet 2 are also listed here: https://github.com/tensorflow/tfjs-models/tree/master/posenet/src

ivelin commented 3 years ago

My testing shows that the resnet50 model is noticeably more accurate than the mobilnet model, although its 30% slower (6fps vs 10fps). Surprisingly the multi-person performs about as fast as the single-person. Also single person can be confused if there are multiple people in the image.

With these findings, I think it's more important to upgrade to a resnet model (e.g. insput 250, stride 32, quantized int or 2 byte float) and less important whether its multi-pose or single-pose.

ivelin commented 3 years ago

@bhavikapanara ^^^

ivelin commented 3 years ago

@bhavikapanara thoughts on this one?

I've done more testing between the current mobilnetv1 model and single-person resnet50 with the parameters in the previous comment (250x250, stride 32, 2 byte float quantization).

I find the resnet50 model to be with slightly slower inference time but with a lot better performance in several important areas:

I would like to know what your own experiments show.

If you are able to verify my findings on your data sets, I think upgrading to a resnet model should be the next priority on the roadmap for improving fall detection.

ivelin commented 3 years ago

PoseNet 2.0 ResNet 50 testing video

https://youtu.be/6Dz12WtpWuM

https://user-images.githubusercontent.com/2234901/107591384-269b1780-6bd0-11eb-9ffd-ba1c7dfc939c.mov

ivelin commented 3 years ago

BlazePose testing video

https://youtu.be/mpqsm1aXUVc

ivelin commented 3 years ago

BlazePose model card: https://drive.google.com/file/d/1zhYyUXhQrb_Gp0lKUFv1ADT3OCxGEQHS/view?usp=drivesdk

TFlite models for pose detection (phase 1) and key point estimation (phase 2).

https://google.github.io/mediapipe/solutions/models.html#pose

An interesting detail worth investigating deeper is the fact the BalzePose estimates body vector as part of the first phase - Pose Detection. That’s before it runs the second phase for key point estimation.

Since for fall detection we are mainly interested in the spinal vector, this could mean an even faster performing inference.

See this text from the blog:

http://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html

“for the human pose tracking we explicitly predict two additional virtual keypoints that firmly describe the human body center, rotation and scale as a circle. Inspired by Leonardo’s Vitruvian man, we predict the midpoint of a person's hips, the radius of a circle circumscribing the whole person, and the incline angle of the line connecting the shoulder and hip midpoints. This results in consistent tracking even for very complicated cases, like specific yoga asanas. “

ivelin commented 3 years ago

More test data with the mobilnetv1 model that shows situations where it is not able to detect a human pose on the ground even though it's easy for a human eye to see it.

ivelin commented 3 years ago

@bhavikapanara Google AI MediaPipe just released a [3D update to BlazePose](One step closer to 3D pose detection: https://google.github.io/mediapipe/solutions/pose) with a Z axis value for depth. This can be helpful for cases when a person falls along the Z axis and the X,Y change vector angle remains small , but it is not telling us the whole story.