Closed galamit86 closed 3 years ago
This PR:
utils
gcpl
statline
main
upload_to_gcs
gcs_to_gbq
get_metadata_cbs
dataset_to_parquet
tables_to_parquet
dict_to_json_file
convert_ndjsons_to_parquet
clean_python_name
endpoint
main.main()
local
gcs
bq
closes #73
Looks good to me!
This PR:
utils
into 4 submodules:gcpl
module contains the functions that interact with the GCP (named gcpl as gcp is used in the code already to indicate "google cloud project")statline
module contains functions that interact with Statline.utils
module contains generic functions (like create_dir, etc.)main
module contains the main logic of processing datasets.gcpl
:upload_to_gcs
gcs_to_gbq
statline
:get_metadata_cbs
dataset_to_parquet
(changed here fromtables_to_parquet
)utils
:dict_to_json_file
convert_ndjsons_to_parquet
clean_python_name
main
:main
endpoint
parameter, to allow for 3 options to end with when runningmain.main()
:local
- locally stored parquet filesgcs
- parquet files stored on GCSbq
- parquet files stored on GCS and a BQ dataset with tables linked to GCS files.main
logic to be more DRY considering the addition ofendpoint
.closes #73