Closed bhavikapanara closed 3 years ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Merging #17 (b84a92d) into main (8a053b9) will increase coverage by
3.22%
. The diff coverage is92.57%
.
@@ Coverage Diff @@
## main #17 +/- ##
==========================================
+ Coverage 89.06% 92.29% +3.22%
==========================================
Files 10 14 +4
Lines 869 1272 +403
==========================================
+ Hits 774 1174 +400
- Misses 95 98 +3
Impacted Files | Coverage Δ | |
---|---|---|
fall_prediction.py | 100.00% <ø> (ø) |
|
src/pipeline/inference.py | 88.00% <ø> (-0.24%) |
:arrow_down: |
tests/pipeline/test_fall_detect_posenet.py | 100.00% <ø> (ø) |
|
src/pipeline/pose_engine.py | 65.65% <38.00%> (-23.41%) |
:arrow_down: |
src/pipeline/pose_base.py | 91.83% <91.83%> (ø) |
|
src/pipeline/movenet_model.py | 97.05% <97.05%> (ø) |
|
src/pipeline/fall_detect.py | 98.62% <100.00%> (+10.09%) |
:arrow_up: |
src/pipeline/posenet_model.py | 100.00% <100.00%> (ø) |
|
tests/pipeline/test_fall_detect_movenet.py | 100.00% <100.00%> (ø) |
|
... and 5 more |
@bhavikapanara this is an improvement. It shows MoveNet is a superior model in most cases, although in some cases it incorrectly assigns keypoints to non-human objects.
Several items that still need work:
I did not see in the test set the following images from this user reported issue. Please crop the photos and add to the test set. That will help us understand if movenet is more robust in terms of non-human object distractions: https://github.com/ambianic/fall-detection/issues/4
Another set of tests we are also missing is multiple people in the same image. Since MoveNet is designed for scenes with one person, it would be good to know whether it focuses on one person in a scene with multiple people or instead it gets confused and scrambles keypoints across multiple people.
Also, please add to the comparison matrix keypoint labels and confidence scores for drawn keypoints. It can be in a json format next or under each image. That would help us understand whether movenet is able to detect with higher confidence level than posetnet.
Thank you!
@bhavikapanara this is an improvement. It shows MoveNet is a superior model in most cases, although in some cases it incorrectly assigns keypoints to non-human objects.
Several items that still need work:
I did not see in the test set the following images from this user reported issue. Please crop the photos and add to the test set. That will help us understand if movenet is more robust in terms of non-human object distractions: Implement more test cases based on public and user contributed video and image data of elderly falls
Another set of tests we are also missing is multiple people in the same image. Since MoveNet is designed for scenes with one person, it would be good to know whether it focuses on one person in a scene with multiple people or instead it gets confused and scrambles keypoints across multiple people .
5
Also, please add to the comparison matrix keypoint labels and confidence scores for drawn keypoints. It can be in a json format next or under each image. That would help us understand whether movenet is able to detect with higher confidence level than posetnet.
Thank you!
@ivelin I have updated the notebook with added multiple person images in test samples and also add comparison part of two models' key-point confidence scores.
please review it and let me know if there are any changes.
@bhavikapanara now the notebook shows the information we can use to compare the models objectively.
However there seems to be a bug in the keypoint scores shown. There are several cases where the posenet image has no keypoints drawn, yet the scores are shows as all 50%.
posenet 0.500000 0.500000 0.500000 0.500000
Please double check the score formulas and rendering. Thank you.
@bhavikapanara this is an improvement. It shows MoveNet is a superior model in most cases, although in some cases it incorrectly assigns keypoints to non-human objects. Several items that still need work: I did not see in the test set the following images from this user reported issue. Please crop the photos and add to the test set. That will help us understand if movenet is more robust in terms of non-human object distractions: Implement more test cases based on public and user contributed video and image data of elderly falls Another set of tests we are also missing is multiple people in the same image. Since MoveNet is designed for scenes with one person, it would be good to know whether it focuses on one person in a scene with multiple people or instead it gets confused and scrambles keypoints across multiple people .
5
Also, please add to the comparison matrix keypoint labels and confidence scores for drawn keypoints. It can be in a json format next or under each image. That would help us understand whether movenet is able to detect with higher confidence level than posetnet. Thank you!
@ivelin I have updated the notebook with added multiple person images in test samples and also add comparison part of two models' key-point confidence scores.
please review it and let me know if there are any changes.
@bhavikapanara please see comment about incorrect keypoint score values.
@ivelin I have fixed the bug. Can you please check it?
@bhavikapanara some additional evidence that the posenet keypoint score calculation is off - either in the notebook or the current production code. Note that score calculation should be done in one place (a single source of truth - the fall detection high level API) and the notebook should be simply iterating over the available posedetection implementations without reimplementing low level tensor arithmetics.
See screenshots that demonstrate inconsistencies between the originally reported posenet detections and their average confidence score next to the notebook scores that appear to be inconsistent. Moreover the notebook seems to draw keypoints and body lines in different coordinates as compared to the originals.
@bhavikapanara this is a big improvement. A few inline comments plus:
- Let's add the inference time in the comparison matrix so we can see how the two models compare given different sample inputs.
- Please convert the sample iteration from the notebook matrix into a set of unit tests with matching keypoint scores for each AI model and test image. We will use this as a base line to protect us from future regressions in the code for each AI model.
Thank you!
@ivelin The first point is completed. you can now see inference time in the notebook. But, I didn't get you on 2nd point. Can you please explain it what I need to do exactly?
@bhavikapanara this is a big improvement. A few inline comments plus:
- Let's add the inference time in the comparison matrix so we can see how the two models compare given different sample inputs.
- Please convert the sample iteration from the notebook matrix into a set of unit tests with matching keypoint scores for each AI model and test image. We will use this as a base line to protect us from future regressions in the code for each AI model.
Thank you!
@ivelin The first point is completed. you can now see inference time in the notebook. But, I didn't get you on 2nd point. Can you please explain it what I need to do exactly?
@bhavikapanara a couple of comments on the time measurement implementation.
Regarding the second request. It is just an ask to convert the notebook iteration over image samples into unit tests for each of the models. Maybe you can put the image names and expected inference scores in a csv or yaml file that we can expand in the future as we receive additional samples from users. Having this dataset test in our CI will protect the code from regressions.
@bhavikapanara this is a big improvement. A few inline comments plus:
- Let's add the inference time in the comparison matrix so we can see how the two models compare given different sample inputs.
- Please convert the sample iteration from the notebook matrix into a set of unit tests with matching keypoint scores for each AI model and test image. We will use this as a base line to protect us from future regressions in the code for each AI model.
Thank you!
@ivelin The first point is completed. you can now see inference time in the notebook. But, I didn't get you on 2nd point. Can you please explain it what I need to do exactly?
@bhavikapanara a couple of comments on the time measurement implementation.
- time.time() is not recommended for performance measurement because its granularity is not guaranteed and can be as much as one second. It also does not take into account sleep time. Instead use time.process_time() or timeit.
- Your results suggest that movenet is about 2x slower than posenet(with mobilenetv1). That seems inconsistent with your the observations you shared before on our slack channel. Can you comment?
Regarding the second request. It is just an ask to convert the notebook iteration over image samples into unit tests for each of the models. Maybe you can put the image names and expected inference scores in a csv or yaml file that we can expand in the future as we receive additional samples from users. Having this dataset test in our CI will protect the code from regressions.
yes, notebook experiments say movenet is about 2x slower than posenet(with mobilenetv1). I am also surprised by it. This one also represents movenet is Ultrafast. Don't know what's the issue.
yes, notebook experiments say movenet is about 2x slower than posenet(with mobilenetv1). I am also surprised by it. This one also represents movenet is Ultrafast. Don't know what's the issue.
OK, good to know we are in sync and the numbers are accurate. The performance numbers in the virtual CI environment are a good relative comparison between models that we can keep track of. We will see how these numbers look on real edge devices.
Please proceed with addressing the other comments in this PR so we can merge.
yes, notebook experiments say movenet is about 2x slower than posenet(with mobilenetv1). I am also surprised by it. This one also represents movenet is Ultrafast. Don't know what's the issue.
OK, good to know we are in sync and the numbers are accurate. The performance numbers in the virtual CI environment are a good relative comparison between models that we can keep track of. We will see how these numbers look on real edge devices.
Please proceed with addressing the other comments in this PR so we can merge.
I think all comments get resolved. Please @ivelin review it and let me know if I missed anything.
:tada: This PR is included in version 1.0.0 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
Here is the Link to access notebooks