Nike-Inc / koheesio

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
https://engineering.nike.com/koheesio/
Apache License 2.0
602 stars 29 forks source link

[FIX] Accidental duplication of logs #105

Closed dannymeijer closed 1 week ago

dannymeijer commented 1 week ago

Description

Enhanced the Step class and its metaclass StepMetaClass to ensure that the execute method is wrapped only once, even when inherited multiple times. Added tests to verify that the log and wrapper are called only once in such scenarios. Also, included custom metaclass examples and their respective tests.

Related Issue

N/A

Motivation and Context

It was established that (under the right conditions) our code allowed the execute method to be wrapped multiple times, even though we had code in place to prevent that. This reflected outwardly by seeing log duplication. For example like this:

Wrong output:

    [1111] [2024-11-18 14:22:18,355] [INFO] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_start_message:307} - Start running step
    [1111] [2024-11-18 14:22:18,355] [DEBUG] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_start_message:308} - Step Input: name='MyCustomGreatGrandChildStep' description='MyCustomGreatGrandChildStep' foo='foo' bar='bar' qux='qux'
    [1111] [2024-11-18 14:22:18,355] [INFO] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_start_message:307} - Start running step
    [1111] [2024-11-18 14:22:18,355] [DEBUG] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_start_message:308} - Step Input: name='MyCustomGreatGrandChildStep' description='MyCustomGreatGrandChildStep' foo='foo' bar='bar' qux='qux'
    [1111] [2024-11-18 14:22:18,355] [INFO] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_start_message:307} - Start running step
    [1111] [2024-11-18 14:22:18,355] [DEBUG] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_start_message:308} - Step Input: name='MyCustomGreatGrandChildStep' description='MyCustomGreatGrandChildStep' foo='foo' bar='bar' qux='qux'
    [1111] [2024-11-18 14:22:18,356] [DEBUG] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_end_message:329} - Step Output: name='MyCustomGreatGrandChildStep.Output' description='Output for MyCustomGreatGrandChildStep' baz='foobar'
    [1111] [2024-11-18 14:22:18,356] [INFO] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_end_message:330} - Finished running step
    [1111] [2024-11-18 14:22:18,356] [DEBUG] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_end_message:329} - Step Output: name='MyCustomGreatGrandChildStep.Output' description='Output for MyCustomGreatGrandChildStep' baz='foobar'
    [1111] [2024-11-18 14:22:18,356] [INFO] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_end_message:330} - Finished running step
    [1111] [2024-11-18 14:22:18,356] [DEBUG] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_end_message:329} - Step Output: name='MyCustomGreatGrandChildStep.Output' description='Output for MyCustomGreatGrandChildStep' baz='foobar'
    [1111] [2024-11-18 14:22:18,356] [INFO] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_end_message:330} - Finished running step

Expected output:

    [1111] [2024-11-18 14:22:18,355] [INFO] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_start_message:307} - Start running step
    [1111] [2024-11-18 14:22:18,355] [DEBUG] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_start_message:308} - Step Input: name='MyCustomGreatGrandChildStep' description='MyCustomGreatGrandChildStep' foo='foo' bar='bar' qux='qux'
    [1111] [2024-11-18 14:22:18,356] [DEBUG] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_end_message:329} - Step Output: name='MyCustomGreatGrandChildStep.Output' description='Output for MyCustomGreatGrandChildStep' baz='foobar'
    [1111] [2024-11-18 14:22:18,356] [INFO] [koheesio.MyCustomGreatGrandChildStep] {__init__.py:_log_end_message:330} - Finished running step

How Has This Been Tested?

Tested by adding unit tests in tests/steps/test_steps.py to verify that the execute method is wrapped only once and that the log messages are called in the correct order. The tests also cover custom metaclass functionality and ensure that the output validation and logging work as expected.

Screenshots (if appropriate):

...

Types of changes

Checklist:

dannymeijer commented 1 week ago

Additional changes: