There are multiple examples where we want to supply supplemental data to be joined in with data used for analysis:
mapping data to decode status codes received from services. for example: pardot visitor_activities has type and type_name that we're decoding and then mapping to event_action.
creating calendar tables. there isn't a standard way to do this in redshift and, after much investigation, the best process for doing this really is joining against a table with all relevant dates in it.
In order to deal with this, we need a procedural way to build and tear down datasets that is 100% integrated into the way we're deploying models. This should likely look like a python script that is integrated into runner.py, and runs multiple python files, each of them responsible for building or tearing down a particular table. Some tables will be best built via code and some will be best loaded from a CSV, so it should be flexible enough to handle multiple methods of data population.
There are multiple examples where we want to supply supplemental data to be joined in with data used for analysis:
In order to deal with this, we need a procedural way to build and tear down datasets that is 100% integrated into the way we're deploying models. This should likely look like a python script that is integrated into runner.py, and runs multiple python files, each of them responsible for building or tearing down a particular table. Some tables will be best built via code and some will be best loaded from a CSV, so it should be flexible enough to handle multiple methods of data population.