acryldata / datahub-actions

DataHub Actions is a framework for responding to changes to your DataHub Metadata Graph in real time.
42 stars 47 forks source link

feat(logging): Improve general logging and add more logging in metadata change sync action #90

Closed TonyOuyangGit closed 1 year ago

TonyOuyangGit commented 1 year ago

This PR adds more logging to metadata_change_sync.py to have more useful information.

github-actions[bot] commented 1 year ago

Unit Test Results (build & test)

63 tests  ±0   63 :heavy_check_mark: ±0   3s :stopwatch: -1s   1 suites ±0     0 :zzz: ±0    1 files   ±0     0 :x: ±0 

Results for commit 731c06a3. ± Comparison against base commit 8861a928.

:recycle: This comment has been updated with latest results.

hsheth2 commented 1 year ago

@TonyOuyangGit the additional log lines are totally fine - those make sense.

I'm a bit worried about the changes in entrypoint.py though - I suspect they'll conflict with the stuff in https://github.com/datahub-project/datahub/blob/01ee351c4c925a2c784b72a3065b506b72e2b7ba/metadata-ingestion/src/datahub/utilities/logging_manager.py#L2 in a fairly unpredictable way.

While you can run datahub_actions as a standalone cli command, some folks use it as a subcommand of datahub actions (see https://github.com/datahub-project/datahub/blob/b3c790aab6c34c001efcca217a305b9478e5397f/metadata-ingestion/src/datahub/entrypoints.py#L173) - that's why they might conflict.

Also, the datahub --log-file option might be what you're looking for (logs debug level to a file, even if the console logs are at info level), and I'd prefer to tweak that one rather than making the changes to datahub actions.

I haven't heard the requirement of separating stdout and stderr before - would love to read more about how GKE does stuff there. In particular, I'm surprised they can't simply parse the log level that we already show in our logs.

hsheth2 commented 1 year ago

@TonyOuyangGit looks like mypy is failing, but should be good to merge once that's fixed