Closed ivelin closed 3 years ago
@bhavikapanara the test continues to fail, which blocks new PR merges.
def test_fall_detection_2_frame_back_case_2():
Hi @ivelin Here we can decrease the value of confidence scores a bit to overcome this issue.
What other things help here any suggestions?
Hi @ivelin Here we can decrease the value of confidence scores a bit to overcome this issue.
What other things help here any suggestions?
Yes, that usually solves the problem with the differences in TFLite results on different architectures.
@bhavikapanara do you still have difficulties solving this issue? Any questions I can help with?
Thanks, @ivelin There is no difficulty...I have recently made a PR for this
Nice. I see the PR checks are in-progress now.
@bhavikapanara as we're waiting on the CI run, I just realized that lowering the confidence threshold for all tests is not ideal, because the same issue may pop-up in the future. A test would pass on your dev environment but fail on others.
Maybe it's better to select data samples for tests that score decisively above the threshold (maybe threshold+0.2) in order to be safe on multiple platforms.
@bhavikapanara Also, we should probably include the score in test assertions so it is easy to debug results in CI runs. Currently the failing test has an assertion on an aggregate result object, which does not make it easy to see which part of the detection produced the unexpected result. Maybe break it down into multiple assertions on specific scalar components that are easier to debug.
@bhavikapanara as we're waiting on the CI run, I just realized that lowering the confidence threshold for all tests is not ideal, because the same issue may pop-up in the future. A test would pass on your dev environment but fail on others.
Maybe it's better to select data samples for tests that score decisively above the threshold (maybe threshold+0.2) in order to be safe on multiple platforms.
Yes @ivelin It makes sense
@bhavikapanara the arm CI run failed on the master branch even though it passed again on the new PR. Let's try to debug the issue a bit deeper this time and fix the root cause. Please add more detailed debug assertions so we can see which one fails in the CI run for arm.
@ivelin will do this today
Cool. Let’s make it a priority. Really want to test the 2 frame optimization. Roughly 1 in 3 detections in my testing fails now because of mid-fall frames.
On Thu, Feb 4, 2021 at 10:34 PM bhavika panara notifications@github.com wrote:
@ivelin https://github.com/ivelin will do this today
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ambianic/ambianic-edge/issues/294#issuecomment-773780872, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBUFMBXRSZ7N6V4QCOWSTS5NYM5ANCNFSM4W5XMGFA .
Cool. Let’s make it a priority. Really want to test the 2 frame optimization. Roughly 1 in 3 detections in my testing fails now because of mid-fall frames. … On Thu, Feb 4, 2021 at 10:34 PM bhavika panara @.***> wrote: @ivelin https://github.com/ivelin will do this today — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#294 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBUFMBXRSZ7N6V4QCOWSTS5NYM5ANCNFSM4W5XMGFA .
Yes...sure @ivelin
:tada: This issue has been resolved in version 1.13.0 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
@bhavikapanara the same test continues to fail even after today's PR fix was merged. You should probably leave in the debug log messages for some time until we observe stable behavior across multiple CI runs.
=================================== FAILURES ===================================
___________________ test_fall_detection_2_frame_back_case_2 ____________________
def test_fall_detection_2_frame_back_case_2():
"""
Expected to detect a fall using frame[t] and frame[t-2].
frame[t-2] : A person is in standing position.
frame[t-1] : A person is mid-way of fall.
frame[t] : A person is fall down.
"""
config = _fall_detect_config()
result = None
def sample_callback(image=None, inference_result=None, **kwargs):
nonlocal result
result = inference_result
fall_detector = FallDetector(**config)
output = _OutPipeElement(sample_callback=sample_callback)
fall_detector.connect_to_next_element(output)
# A frame at t-2 timestamp when person is in standing position.
img_1 = _get_image(file_name='fall_img_1.png')
# A frame at t-1 timestamp when person is mid-way of fall.
img_2 = _get_image(file_name='fall_img_2_2.png')
# A frame at t timestamp when person falls down.
img_3 = _get_image(file_name='fall_img_2.png')
fall_detector.receive_next_sample(image=img_1)
fall_detector.min_time_between_frames = 0.01
time.sleep(fall_detector.min_time_between_frames)
fall_detector.receive_next_sample(image=img_2)
fall_detector.min_time_between_frames = 0.01
time.sleep(fall_detector.min_time_between_frames)
assert not result
fall_detector.receive_next_sample(image=img_3)
> assert result
E assert []
tests/pipeline/ai/test_fall_detect.py:615: AssertionError
------------------------------ Captured log call -------------------------------
INFO ambianic.pipeline.ai.tf_detect:tf_detect.py:178 FallDetector inference time 5928.85 ms, 0.17 fps in pipeline unknown
INFO ambianic.pipeline.ai.tf_detect:tf_detect.py:178 FallDetector inference time 5838.53 ms, 0.17 fps in pipeline unknown
INFO ambianic.pipeline.ai.tf_detect:tf_detect.py:178 FallDetector inference time 5844.05 ms, 0.17 fps in pipeline unknown
@bhavikapanara please ignore. The latest fail was due to my mistake. I did not properly merge the PR branch with your latest fix from master. Closing this issue...again :)
Describe the bug It appears that the newly merged PR ambianic/ambianic-edge#282 has tests that fail intermittently in CI on ARM CPU.
To Reproduce https://github.com/ambianic/ambianic-edge/runs/1810906695?check_suite_focus=true#step:3:3921
Expected behavior Tests should pass reliably in CI.
Host environment (please complete the following information):
Additional context In the past this has happened a few times usually because the ARM version of TFLite outputs lower confidence scores for inferences than the x86 version.