dataverbinders / statline-bq

Library to fetch CBS open datasets into parquet and optionally load into Google Cloud Storage and BigQuery
MIT License
0 stars 0 forks source link

Logging #71

Closed galamit86 closed 3 years ago

galamit86 commented 3 years ago

This PR adds standard logging to the library, through the implementation of a general logging decorator + a per-module specific logger.

The implementation is done in 3 files:

Closes #67


@dkapitan Changes from our last discussion in the issue:

dkapitan commented 3 years ago

@galamit86 Code looks fine, however, implicitly returning None when in fact an Exception occured doesn't seem right.

Looking at your example (line 1399), two exceptions can occur (see docs https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client):

When not catching these and passing it on, you don't know what's happening.

I do appreciate it's more code writing try ... except all the time, but it seems like this anti-pattern: https://realpython.com/the-most-diabolical-python-antipattern/

I don't know exactly what a good practice is. After reading up on item 65 in Larkin's effective Python my thought are as follows:

See this example in RealPython.

galamit86 commented 3 years ago

I am on board with general sentiment of being explicit.

Where I would like to deviate from it, is when I'm deliberately (mis)using raised exceptions to check for something, and return None (or False) as an indication. That happens in 2 places, one of them is in check_bq_dataset (see my comment there). The other place is in get_metadata_gcp (line 269).

In both cases, this isn't really about this new implementation of the log decorator - it's a separate issue, and the code was behaving in this way beforehand. If we are ok with this concept in general, then I will implement returning the error from the decorator after logging, instead of None, and alter these 2 functions accordingly (as I suggest in the specific comment above).

And finally by changing the decorator's behaviour to end with:

    except Exception as e:
        # Log exception if occurs in function
        logger.exception(f"Exception in {func.__name__}: {str(sys.exc_info()[1])}")
        return e

I think we avoid the problems pointed by the antipattern post you've sent, right? Since we actually log the information, and return the error, in fact raising it?

dkapitan commented 3 years ago

@galamit86 This works for me, thanks. And I agree that the issue with the two functions is something else and minor in any case since they just do some checking.

galamit86 commented 3 years ago

@dkapitan Great - small correction on my part - return e does not do reraise the exception - raise e does. Pushed updated code.

Also, logging sys.exc_info()[1] (taken from the medium article), is the same as logging e, as far as I can see, so I'm using that.

To sum up, our general logging decorator should now behaves like so:

This allows us, as far as I can see, to decorate any python function, without changing its expected behaviour. For example, to continue relying on raising exceptions ourselves, if we like to, like we do in check_gcp_env:

def check_gcp_env(gcp_env: str, options: List[str] = ["dev", "test", "prod"]):
    if gcp_env not in options:
        raise ValueError(f"gcp_env must be one of {options}")
    else:
        return True