[BUG] Fall detection test fails intermittently in PR #282

ivelin commented 3 years ago

Describe the bug It appears that the newly merged PR ambianic/ambianic-edge#282 has tests that fail intermittently in CI on ARM CPU.

___________________ test_fall_detection_2_frame_back_case_2 ____________________

    def test_fall_detection_2_frame_back_case_2():
        """
            Expected to detect a fall using frame[t] and frame[t-2].
            frame[t-2] : A person is in standing position.
            frame[t-1] : A person is mid-way of fall.
            frame[t]   : A person is fall down.
        """
        config = _fall_detect_config()
        result = None

        def sample_callback(image=None, inference_result=None, **kwargs):
            nonlocal result
            result = inference_result

        fall_detector = FallDetector(**config)

        output = _OutPipeElement(sample_callback=sample_callback)
        fall_detector.connect_to_next_element(output)

        # A frame at t-2 timestamp when person is in standing position.
        img_1 = _get_image(file_name='fall_img_1.png')

        # A frame at t-1 timestamp when person is mid-way of fall.
        img_2 = _get_image(file_name='fall_img_2_2.png')

        # A frame at t timestamp when person falls down.
        img_3 = _get_image(file_name='fall_img_2.png')

        fall_detector.receive_next_sample(image=img_1)
        fall_detector.min_time_between_frames = 0.01
        time.sleep(fall_detector.min_time_between_frames)

        fall_detector.receive_next_sample(image=img_2)
        fall_detector.min_time_between_frames = 0.01
        time.sleep(fall_detector.min_time_between_frames)

        assert not result

        fall_detector.receive_next_sample(image=img_3)

>       assert result
E       assert []

To Reproduce https://github.com/ambianic/ambianic-edge/runs/1810906695?check_suite_focus=true#step:3:3921

Expected behavior Tests should pass reliably in CI.

Host environment (please complete the following information):

Hardware: github actions CI, ARM virtual environment
OS: Ubuntu
Container: docker

Additional context In the past this has happened a few times usually because the ARM version of TFLite outputs lower confidence scores for inferences than the x86 version.

ivelin commented 3 years ago

@bhavikapanara the test continues to fail, which blocks new PR merges. def test_fall_detection_2_frame_back_case_2():

bhavikapanara commented 3 years ago

Hi @ivelin Here we can decrease the value of confidence scores a bit to overcome this issue.

What other things help here any suggestions?

ivelin commented 3 years ago

Hi @ivelin Here we can decrease the value of confidence scores a bit to overcome this issue.

What other things help here any suggestions?

Yes, that usually solves the problem with the differences in TFLite results on different architectures.

ivelin commented 3 years ago

@bhavikapanara do you still have difficulties solving this issue? Any questions I can help with?

bhavikapanara commented 3 years ago

Thanks, @ivelin There is no difficulty...I have recently made a PR for this

ivelin commented 3 years ago

Nice. I see the PR checks are in-progress now.

ivelin commented 3 years ago

@bhavikapanara as we're waiting on the CI run, I just realized that lowering the confidence threshold for all tests is not ideal, because the same issue may pop-up in the future. A test would pass on your dev environment but fail on others.

Maybe it's better to select data samples for tests that score decisively above the threshold (maybe threshold+0.2) in order to be safe on multiple platforms.

ivelin commented 3 years ago

@bhavikapanara Also, we should probably include the score in test assertions so it is easy to debug results in CI runs. Currently the failing test has an assertion on an aggregate result object, which does not make it easy to see which part of the detection produced the unexpected result. Maybe break it down into multiple assertions on specific scalar components that are easier to debug.

bhavikapanara commented 3 years ago

@bhavikapanara as we're waiting on the CI run, I just realized that lowering the confidence threshold for all tests is not ideal, because the same issue may pop-up in the future. A test would pass on your dev environment but fail on others.

Maybe it's better to select data samples for tests that score decisively above the threshold (maybe threshold+0.2) in order to be safe on multiple platforms.

Yes @ivelin It makes sense

ivelin commented 3 years ago

@bhavikapanara the arm CI run failed on the master branch even though it passed again on the new PR. Let's try to debug the issue a bit deeper this time and fix the root cause. Please add more detailed debug assertions so we can see which one fails in the CI run for arm.

bhavikapanara commented 3 years ago

@ivelin will do this today

ivelin commented 3 years ago

Cool. Let’s make it a priority. Really want to test the 2 frame optimization. Roughly 1 in 3 detections in my testing fails now because of mid-fall frames.

On Thu, Feb 4, 2021 at 10:34 PM bhavika panara notifications@github.com wrote:

@ivelin https://github.com/ivelin will do this today

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ambianic/ambianic-edge/issues/294#issuecomment-773780872, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBUFMBXRSZ7N6V4QCOWSTS5NYM5ANCNFSM4W5XMGFA .

bhavikapanara commented 3 years ago

Cool. Let’s make it a priority. Really want to test the 2 frame optimization. Roughly 1 in 3 detections in my testing fails now because of mid-fall frames. … On Thu, Feb 4, 2021 at 10:34 PM bhavika panara @.***> wrote: @ivelin https://github.com/ivelin will do this today — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#294 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBUFMBXRSZ7N6V4QCOWSTS5NYM5ANCNFSM4W5XMGFA .

Yes...sure @ivelin

github-actions[bot] commented 3 years ago

:tada: This issue has been resolved in version 1.13.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

ivelin commented 3 years ago

@bhavikapanara the same test continues to fail even after today's PR fix was merged. You should probably leave in the debug log messages for some time until we observe stable behavior across multiple CI runs.

=================================== FAILURES ===================================
___________________ test_fall_detection_2_frame_back_case_2 ____________________

    def test_fall_detection_2_frame_back_case_2():
        """
            Expected to detect a fall using frame[t] and frame[t-2].
            frame[t-2] : A person is in standing position.
            frame[t-1] : A person is mid-way of fall.
            frame[t]   : A person is fall down.
        """
        config = _fall_detect_config()
        result = None

        def sample_callback(image=None, inference_result=None, **kwargs):
            nonlocal result
            result = inference_result

        fall_detector = FallDetector(**config)

        output = _OutPipeElement(sample_callback=sample_callback)
        fall_detector.connect_to_next_element(output)

        # A frame at t-2 timestamp when person is in standing position.
        img_1 = _get_image(file_name='fall_img_1.png')

        # A frame at t-1 timestamp when person is mid-way of fall.
        img_2 = _get_image(file_name='fall_img_2_2.png')

        # A frame at t timestamp when person falls down.
        img_3 = _get_image(file_name='fall_img_2.png')

        fall_detector.receive_next_sample(image=img_1)
        fall_detector.min_time_between_frames = 0.01
        time.sleep(fall_detector.min_time_between_frames)

        fall_detector.receive_next_sample(image=img_2)
        fall_detector.min_time_between_frames = 0.01
        time.sleep(fall_detector.min_time_between_frames)

        assert not result

        fall_detector.receive_next_sample(image=img_3)

>       assert result
E       assert []

tests/pipeline/ai/test_fall_detect.py:615: AssertionError
------------------------------ Captured log call -------------------------------
INFO     ambianic.pipeline.ai.tf_detect:tf_detect.py:178 FallDetector inference time 5928.85 ms, 0.17 fps in pipeline unknown
INFO     ambianic.pipeline.ai.tf_detect:tf_detect.py:178 FallDetector inference time 5838.53 ms, 0.17 fps in pipeline unknown
INFO     ambianic.pipeline.ai.tf_detect:tf_detect.py:178 FallDetector inference time 5844.05 ms, 0.17 fps in pipeline unknown

ivelin commented 3 years ago

@bhavikapanara please ignore. The latest fail was due to my mistake. I did not properly merge the PR branch with your latest fix from master. Closing this issue...again :)

ambianic / fall-detection

[BUG] Fall detection test fails intermittently in PR #282 #9