Open ivelin opened 3 years ago
The various JS models are easy to see in the demo console log:
https://storage.googleapis.com/tfjs-models/demos/posenet/camera.html
For example a good resnet model for single-pose with balanced parameters for CPU (see screenshot for param details): https://storage.googleapis.com/tfjs-models/savedmodel/posenet/resnet50/quant2/group1-shard11of12.bin
TF saved model checkpoints for PoseNet 2 are also listed here: https://github.com/tensorflow/tfjs-models/tree/master/posenet/src
My testing shows that the resnet50 model is noticeably more accurate than the mobilnet model, although its 30% slower (6fps vs 10fps). Surprisingly the multi-person performs about as fast as the single-person. Also single person can be confused if there are multiple people in the image.
With these findings, I think it's more important to upgrade to a resnet model (e.g. insput 250, stride 32, quantized int or 2 byte float) and less important whether its multi-pose or single-pose.
@bhavikapanara ^^^
@bhavikapanara thoughts on this one?
I've done more testing between the current mobilnetv1 model and single-person resnet50 with the parameters in the previous comment (250x250, stride 32, 2 byte float quantization).
I find the resnet50 model to be with slightly slower inference time but with a lot better performance in several important areas:
I would like to know what your own experiments show.
If you are able to verify my findings on your data sets, I think upgrading to a resnet model should be the next priority on the roadmap for improving fall detection.
PoseNet 2.0 ResNet 50 testing video
https://user-images.githubusercontent.com/2234901/107591384-269b1780-6bd0-11eb-9ffd-ba1c7dfc939c.mov
BlazePose testing video
BlazePose model card: https://drive.google.com/file/d/1zhYyUXhQrb_Gp0lKUFv1ADT3OCxGEQHS/view?usp=drivesdk
TFlite models for pose detection (phase 1) and key point estimation (phase 2).
https://google.github.io/mediapipe/solutions/models.html#pose
An interesting detail worth investigating deeper is the fact the BalzePose estimates body vector as part of the first phase - Pose Detection. That’s before it runs the second phase for key point estimation.
Since for fall detection we are mainly interested in the spinal vector, this could mean an even faster performing inference.
See this text from the blog:
http://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html
“for the human pose tracking we explicitly predict two additional virtual keypoints that firmly describe the human body center, rotation and scale as a circle. Inspired by Leonardo’s Vitruvian man, we predict the midpoint of a person's hips, the radius of a circle circumscribing the whole person, and the incline angle of the line connecting the shoulder and hip midpoints. This results in consistent tracking even for very complicated cases, like specific yoga asanas. “
More test data with the mobilnetv1 model that shows situations where it is not able to detect a human pose on the ground even though it's easy for a human eye to see it.
@bhavikapanara Google AI MediaPipe just released a [3D update to BlazePose](One step closer to 3D pose detection: https://google.github.io/mediapipe/solutions/pose) with a Z axis value for depth. This can be helpful for cases when a person falls along the Z axis and the X,Y change vector angle remains small , but it is not telling us the whole story.
UPDATE: June 11, 2021
Is your feature request related to a problem? Please describe. Currently we use mobilnetv2 with a 300x300 input tensor (image) by default for object detection.
It struggles with poses of people laying down on the floor. We experimented with rotating images +/-90', which improves overall detection rates, but it still misses poses of fallen people, even when the full body is clearly visible by a human eye.
Clearly the model has not been trained on fallen people poses.
Describe the solution you'd like
Google AI introduced MoveNet on May 17, 2021: 30fps on a mobile phone. Initially for TensorflowJS with a follow up model release coming to TFLite.
Google AI released PoseNet 2.0 with a Resnet50 in 2020 base model which has 5-6fps performance on desktop CPU and noticeably better detection rates. Interactive web demo here. However testing shows that even with these improvements, it still misses some poses of people laying down (fallen poses) that are otherwise easy for a human eye to recognize. See an example recorded video below that provides a reference for situations when resnet misses poses.
Google AI MediaPipe released a new iteration of BlazePose, which detects 33 (vs 15) keypoints at 25-55fps on desktop CPU (5-10 times faster than PoseNet 2 ResNet50). Testing shows that blazepose does a better job with horizontal people poses, although it still misses some laying positions. See attached video for reference. BlazePose interactive web demo here. Pose detection TFLite model here.
Additional context
See TensorFlow 2 Detection Model Zoo https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
Notice the high performance and dual purpose (object + keypoints) for CenterNet Resnet50 V1 FPN Keypoints 512x512 and CenterNet Resnet50 V2 Keypoints 512x512.
More on CenterNet and its various applications for object detection, posedetection and object motion tracking. https://github.com/xingyizhou/CenterNet