Part of the broader "Data Observability" story, this task is specifically around the "application monitoring" capabilities.
Currently, if customers have scheduled notebooks/pipelines/jobs, they need to go to "Monitor" menu on the Fabric UI to see the run details (start time, end time, status etc). Also, they might want to do some high-level application specific logging such as no. of records processed, failed records, custom metrics etc.
What
To facilitate that, we want to create a custom logging framework, with the following features:
[Required] Use "OpenTelemetry".
[Required] Write data to predefined Lakehouse logging tables.
[Optional] Write logging information to a file in a folder under the Lakehouse "File" section.
[Optional] It would be added to the spark pools as a wheel package. And notebooks would import it.
How
Conceptualize and implement a simple generic python based logging package.
Identify a processing step. Ex: Process to load RAW data into SILVER layer.
Identify the information to log: Ex: Success records, failure records, start time, end time, total time taken, status etc.
Upload the logging package to the spark pool, and use it in the notebook for logging information.
Part of the broader "Data Observability" story, this task is specifically around the "application monitoring" capabilities.
Currently, if customers have scheduled notebooks/pipelines/jobs, they need to go to "Monitor" menu on the Fabric UI to see the run details (start time, end time, status etc). Also, they might want to do some high-level application specific logging such as no. of records processed, failed records, custom metrics etc.
What To facilitate that, we want to create a custom logging framework, with the following features:
How Conceptualize and implement a simple generic python based logging package. Identify a processing step. Ex: Process to load RAW data into SILVER layer. Identify the information to log: Ex: Success records, failure records, start time, end time, total time taken, status etc. Upload the logging package to the spark pool, and use it in the notebook for logging information.