Structure new project with enhancements

overall: we need to think about how to set the proper data types in the parquet format (it is now inferred automatically)
- When converting from json to pyarrow table, ParseOptions could be used to provide a specific schema. That's where we could provide it, and I assume it will be kept when written to Parquet. How to infer it should still be investigated. Perhaps the CBS data provides it somewhere
overall: we need to do some proper scaffolding with a configuration file (.toml) where we set
- the temp directory on the host machine
- set up the GCP project parameters

Furthermore, since we are incorporating GCS, we can skip loading the data as-is into BigQuery but use external tables instead.

So all in all, with all these major upgrades and changes I am thinking that it is perhaps even better to define a new, separate project cbs-bq with the following outline:

Prerequisites: GCP account with at least 1 GCS bucket and 1 project with BigQuery activated. These settings go in a config file
Input: list of v3 and/or v4 datasets, which can go in a .toml configuration file
Options:
- keep older version of dataset on GCS when downloading new one (default=True)
Output after running commandline app:
- GCS is filled with parquet files, by API-version/dataset-id/date
- All parquet files are queryable as external tables in BigQuery, with separate BigQuery datasets for v3 and v4 (for clarity)
- Table descriptions are added (feature that Eddy has already written, but he was struggling a bit because loading the data asynchronously was a bit of a pain to know when he could add the descriptions. We don't have that problem now)

dataverbinders / nimbletl

Structure new project with enhancements #11