dataverbinders / statline-bq

Library to fetch CBS open datasets into parquet and optionally load into Google Cloud Storage and BigQuery
MIT License
0 stars 0 forks source link

Add standard logging #67

Closed galamit86 closed 3 years ago

galamit86 commented 3 years ago

Implement logging using the standard logging module:

galamit86 commented 3 years ago

@dkapitan Trying to implement a decorator logger, I am having trouble understanding what should happen in the decorator, and what in the function itself. More specifically, where do I instantiate a logger object?

Currently, with the basic implementation, I do this:

## statline_bq/utils.py
import logging

# provide logging configuration
LOGGING_CONF_FILE = Path(__file__).parent / "logging_conf.toml"
logging.config.dictConfig(toml.load(LOGGING_CONF_FILE))

# create logger object
logger = logging.getLogger(__name__)

# define some function, and use logger to create the Log Record
def some_func():
    # do some things
    do_something()
    do_something_else()
    # log some things
    logger.info("This is an INFO log message")

My first instinct, was to write this decorator:

## statline_bq/log.py
from pathlib import Path
import functools
import logging
import logging.config

import toml

LOGGING_CONF_FILE = Path(__file__).parent / "logging_conf.toml"
logging.config.dictConfig(toml.load(LOGGING_CONF_FILE))

def logdec(func):
    @functools.wraps(func)
    def log(*args, **kwargs):
        logger = logging.getLogger(func.__module__) #should provide the same as __name__ in the file itself
        result = func(*args, **kwargs)
        return result

    return log

Naively hoping that the by replacing the last line of some_func() from logger.info("MESSAGE") to logging.info("MESSAGE"), and decorating it, so it ends up looking like this:

## statline_bq/utils.py
import logging
from statline_bq.log import logdec

# No logger or configuration defined here now

@logdec
def some_func():
    # do some things
    do_something()
    do_something_else()
    # log some things
    logging.info("This is an INFO log message")

, it would use the logger defined in the decorator. But that is not true, as logging.info() simply uses the root logger, and the logger defined in the decorator is not used.

The examples I can find online are always about logging things from "outside" of the function - either because it finished concluded running - like the returned value or a raised exception, or value that are constant to the function - like its name or module. But it seems to me we are looking for something else: we want to log things that are happening inside the function, during normal operation.

I am still working on it, so my problem definition might be off still. But if you have some thoughts or reference to this, let me know.

galamit86 commented 3 years ago

@dkapitan I thought I saw you replied to this - was that somewhere else?

dkapitan commented 3 years ago

@galamit86 I have sent you an email with an eBook, that's probably what you saw.

galamit86 commented 3 years ago

@dkapitan See logging branch for latest implementation:

If you have any input, let me know!

dkapitan commented 3 years ago

@galamit86 looks really good, clean code and very generic way of implementation. I have learned about decorators, too 😄

One remark / suggestion: do we want to add a timestamp to the log message or do we assume that the logging environment (Prefect) has that? Proposal: add parameter with_timestamp which defaults to false. If true, prepend logstring with time.now().

galamit86 commented 3 years ago

@dkapitan Thanks! I learned a quite a lot about decorators and logging - though it feels like scratching the surface :)

prefect definitely formats its log messages with a timestamp, but so do we! See this row, which defines the format of a our log message (it's a pretty standard implementation - I saw it in most use cases I looked at).

A couple of example lines look like this:

2021-05-07 13:35:10,157 - statline_bq.utils - INFO: Starting function main with arguments: (id='83583NED', source='cbs', third_party=False, config=<Box: {'gcp': {'prod': {'cbs_dl': {'project_id': 'dataverbinders-cbs-dl', 'bucket': 'dataverbinders-cbs-dl', 'location': 'EU'}, 'external_dl': {'project_id': 'dataverbinders-external-dl', 'bucket': 'dataverbinders-external-dl', 'location': 'EU'}, 'dwh': {'project_id': 'dataverbinders-open-dwh', 'bucket': 'dataverbinders-open-dwh', 'location': 'EU'}}, 'test': {'project_id': 'dataverbinders-test', 'bucket': 'dataverbinders-test', 'location': 'EU'}, 'dev': {'project_id': 'dataverbinders-dev', 'bucket': 'dataverbinders-dev', 'location': 'EU'}}, 'paths': {'cbs': 'cbs', 'mlz': 'mlz', 'rivm': 'rivm', 'politie': 'politie', 'azw': 'azw', 'iv3': 'iv3', 'mkb': 'mkb', 'jm': 'jm', 'interreg': 'interreg', 'agb': 'agb', 'vektis_open_data': 'vektis_open_data', 'bag': 'bag'}}>, gcp_env='dev', force=False)
2021-05-07 13:35:10,159 - statline_bq.utils - INFO: Starting function check_gcp_env with arguments: ('dev')
2021-05-07 13:35:10,159 - statline_bq.utils - INFO: Succesfully finished function check_gcp_env
2021-05-07 13:35:10,159 - statline_bq.utils - INFO: Starting function check_v4 with arguments: (id='83583NED', third_party=False)
2021-05-07 13:35:10,251 - statline_bq.utils - INFO: Succesfully finished function check_v4
2021-05-07 13:35:10,251 - statline_bq.utils - INFO: Starting function cbsodata_to_gbq with arguments: (id='83583NED', odata_version='v3', third_party=False, source='cbs', config=<Box: {'gcp': {'prod': {'cbs_dl': {'project_id': 'dataverbinders-cbs-dl', 'bucket': 'dataverbinders-cbs-dl', 'location': 'EU'}, 'external_dl': {'project_id': 'dataverbinders-external-dl', 'bucket': 'dataverbinders-external-dl', 'location': 'EU'}, 'dwh': {'project_id': 'dataverbinders-open-dwh', 'bucket': 'dataverbinders-open-dwh', 'location': 'EU'}}, 'test': {'project_id': 'dataverbinders-test', 'bucket': 'dataverbinders-test', 'location': 'EU'}, 'dev': {'project_id': 'dataverbinders-dev', 'bucket': 'dataverbinders-dev', 'location': 'EU'}}, 'paths': {'cbs': 'cbs', 'mlz': 'mlz', 'rivm': 'rivm', 'politie': 'politie', 'azw': 'azw', 'iv3': 'iv3', 'mkb': 'mkb', 'jm': 'jm', 'interreg': 'interreg', 'agb': 'agb', 'vektis_open_data': 'vektis_open_data', 'bag': 'bag'}}>, gcp_env='dev', force=False)
2021-05-07 13:35:10,251 - statline_bq.utils - INFO: Starting function set_gcp with arguments: (<Box: {'gcp': {'prod': {'cbs_dl': {'project_id': 'dataverbinders-cbs-dl', 'bucket': 'dataverbinders-cbs-dl', 'location': 'EU'}, 'external_dl': {'project_id': 'dataverbinders-external-dl', 'bucket': 'dataverbinders-external-dl', 'location': 'EU'}, 'dwh': {'project_id': 'dataverbinders-open-dwh', 'bucket': 'dataverbinders-open-dwh', 'location': 'EU'}}, 'test': {'project_id': 'dataverbinders-test', 'bucket': 'dataverbinders-test', 'location': 'EU'}, 'dev': {'project_id': 'dataverbinders-dev', 'bucket': 'dataverbinders-dev', 'location': 'EU'}}, 'paths': {'cbs': 'cbs', 'mlz': 'mlz', 'rivm': 'rivm', 'politie': 'politie', 'azw': 'azw', 'iv3': 'iv3', 'mkb': 'mkb', 'jm': 'jm', 'interreg': 'interreg', 'agb': 'agb', 'vektis_open_data': 'vektis_open_data', 'bag': 'bag'}}>, 'dev', 'cbs')
2021-05-07 13:35:10,252 - statline_bq.utils - INFO: Succesfully finished function set_gcp
2021-05-07 13:35:10,252 - statline_bq.utils - INFO: Starting function get_urls with arguments: (id='83583NED', odata_version='v3', third_party=False)
2021-05-07 13:35:10,326 - statline_bq.utils - INFO: Succesfully finished function get_urls
2021-05-07 13:35:10,326 - statline_bq.utils - INFO: Starting function get_metadata_cbs with arguments: (id='83583NED', third_party=False, odata_version='v3')
2021-05-07 13:35:10,398 - statline_bq.utils - INFO: Succesfully finished function get_metadata_cbs
2021-05-07 13:35:10,398 - statline_bq.utils - INFO: Starting function get_metadata_gcp with arguments: (id='83583NED', source='cbs', odata_version='v3', gcp=<Box: {'project_id': 'dataverbinders-dev', 'bucket': 'dataverbinders-dev', 'location': 'EU'}>, credentials=None)
2021-05-07 13:35:12,364 - statline_bq.utils - INFO: Starting function get_latest_folder with arguments: ('cbs/v3/83583NED', <Box: {'project_id': 'dataverbinders-dev', 'bucket': 'dataverbinders-dev', 'location': 'EU'}>)
2021-05-07 13:35:14,707 - statline_bq.utils - INFO: Succesfully finished function get_latest_folder
2021-05-07 13:35:14,835 - statline_bq.utils - INFO: Succesfully finished function get_metadata_gcp
dkapitan commented 3 years ago

sweet!

On Sat, May 8, 2021 at 12:06 PM Amit Gal @.***> wrote:

@dkapitan https://github.com/dkapitan Thanks! I learned a quite a lot about decorators and logging - though it feels like scratching the surface :)

prefect definitely formats its log messages with a timestamp, but so do we! See this row https://github.com/dataverbinders/statline-bq/blob/aeb69ddf51345f453bd23e2e8c1f7d57bc1cf8fc/statline_bq/logging_conf.toml#L6, which defines the format of a our log message (it's a pretty standard implementation - I saw it in most use cases I looked at).

A couple of example lines look like this:

2021-05-07 13:35:10,157 - statline_bq.utils - INFO: Starting function main with arguments: (id='83583NED', source='cbs', third_party=False, config=<Box: {'gcp': {'prod': {'cbs_dl': {'project_id': 'dataverbinders-cbs-dl', 'bucket': 'dataverbinders-cbs-dl', 'location': 'EU'}, 'external_dl': {'project_id': 'dataverbinders-external-dl', 'bucket': 'dataverbinders-external-dl', 'location': 'EU'}, 'dwh': {'project_id': 'dataverbinders-open-dwh', 'bucket': 'dataverbinders-open-dwh', 'location': 'EU'}}, 'test': {'project_id': 'dataverbinders-test', 'bucket': 'dataverbinders-test', 'location': 'EU'}, 'dev': {'project_id': 'dataverbinders-dev', 'bucket': 'dataverbinders-dev', 'location': 'EU'}}, 'paths': {'cbs': 'cbs', 'mlz': 'mlz', 'rivm': 'rivm', 'politie': 'politie', 'azw': 'azw', 'iv3': 'iv3', 'mkb': 'mkb', 'jm': 'jm', 'interreg': 'interreg', 'agb': 'agb', 'vektis_open_data': 'vektis_open_data', 'bag': 'bag'}}>, gcp_env='dev', force=False) 2021-05-07 13:35:10,159 - statline_bq.utils - INFO: Starting function check_gcp_env with arguments: ('dev') 2021-05-07 13:35:10,159 - statline_bq.utils - INFO: Succesfully finished function check_gcp_env 2021-05-07 13:35:10,159 - statline_bq.utils - INFO: Starting function check_v4 with arguments: (id='83583NED', third_party=False) 2021-05-07 13:35:10,251 - statline_bq.utils - INFO: Succesfully finished function check_v4 2021-05-07 13:35:10,251 - statline_bq.utils - INFO: Starting function cbsodata_to_gbq with arguments: (id='83583NED', odata_version='v3', third_party=False, source='cbs', config=<Box: {'gcp': {'prod': {'cbs_dl': {'project_id': 'dataverbinders-cbs-dl', 'bucket': 'dataverbinders-cbs-dl', 'location': 'EU'}, 'external_dl': {'project_id': 'dataverbinders-external-dl', 'bucket': 'dataverbinders-external-dl', 'location': 'EU'}, 'dwh': {'project_id': 'dataverbinders-open-dwh', 'bucket': 'dataverbinders-open-dwh', 'location': 'EU'}}, 'test': {'project_id': 'dataverbinders-test', 'bucket': 'dataverbinders-test', 'location': 'EU'}, 'dev': {'project_id': 'dataverbinders-dev', 'bucket': 'dataverbinders-dev', 'location': 'EU'}}, 'paths': {'cbs': 'cbs', 'mlz': 'mlz', 'rivm': 'rivm', 'politie': 'politie', 'azw': 'azw', 'iv3': 'iv3', 'mkb': 'mkb', 'jm': 'jm', 'interreg': 'interreg', 'agb': 'agb', 'vektis_open_data': 'vektis_open_data', 'bag': 'bag'}}>, gcp_env='dev', force=False) 2021-05-07 13:35:10,251 - statline_bq.utils - INFO: Starting function set_gcp with arguments: (<Box: {'gcp': {'prod': {'cbs_dl': {'project_id': 'dataverbinders-cbs-dl', 'bucket': 'dataverbinders-cbs-dl', 'location': 'EU'}, 'external_dl': {'project_id': 'dataverbinders-external-dl', 'bucket': 'dataverbinders-external-dl', 'location': 'EU'}, 'dwh': {'project_id': 'dataverbinders-open-dwh', 'bucket': 'dataverbinders-open-dwh', 'location': 'EU'}}, 'test': {'project_id': 'dataverbinders-test', 'bucket': 'dataverbinders-test', 'location': 'EU'}, 'dev': {'project_id': 'dataverbinders-dev', 'bucket': 'dataverbinders-dev', 'location': 'EU'}}, 'paths': {'cbs': 'cbs', 'mlz': 'mlz', 'rivm': 'rivm', 'politie': 'politie', 'azw': 'azw', 'iv3': 'iv3', 'mkb': 'mkb', 'jm': 'jm', 'interreg': 'interreg', 'agb': 'agb', 'vektis_open_data': 'vektis_open_data', 'bag': 'bag'}}>, 'dev', 'cbs') 2021-05-07 13:35:10,252 - statline_bq.utils - INFO: Succesfully finished function set_gcp 2021-05-07 13:35:10,252 - statline_bq.utils - INFO: Starting function get_urls with arguments: (id='83583NED', odata_version='v3', third_party=False) 2021-05-07 13:35:10,326 - statline_bq.utils - INFO: Succesfully finished function get_urls 2021-05-07 13:35:10,326 - statline_bq.utils - INFO: Starting function get_metadata_cbs with arguments: (id='83583NED', third_party=False, odata_version='v3') 2021-05-07 13:35:10,398 - statline_bq.utils - INFO: Succesfully finished function get_metadata_cbs 2021-05-07 13:35:10,398 - statline_bq.utils - INFO: Starting function get_metadata_gcp with arguments: (id='83583NED', source='cbs', odata_version='v3', gcp=<Box: {'project_id': 'dataverbinders-dev', 'bucket': 'dataverbinders-dev', 'location': 'EU'}>, credentials=None) 2021-05-07 13:35:12,364 - statline_bq.utils - INFO: Starting function get_latest_folder with arguments: ('cbs/v3/83583NED', <Box: {'project_id': 'dataverbinders-dev', 'bucket': 'dataverbinders-dev', 'location': 'EU'}>) 2021-05-07 13:35:14,707 - statline_bq.utils - INFO: Succesfully finished function get_latest_folder 2021-05-07 13:35:14,835 - statline_bq.utils - INFO: Succesfully finished function get_metadata_gcp

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dataverbinders/statline-bq/issues/67#issuecomment-835263230, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7GYDWDF2MOLFNJYFNPDYTTMUELPANCNFSM43335PIQ .