This data repository should be deprecated in favor of using db-data-library. I think one of the reasonings behind creating the data repo originally was that data library wasn't fully functional (an assumption) and the data was meant to be kept private as it contains sensitive information not available to the public.
Generally, all the data repo does is process either csv files, xlsx files, or geospatial files into SQL files. Data Library has the functionality do this and also apply some standards we use when we archive input/source data.
Create data templates for the source data we receive from SL in DCP Housing. This will allows us to archive source data and we no longer have to keep the data in the raw folder where we constantly change the dates of things, have multiple files with similar data etc...
You can set the access level directly in a data library yaml template (e.g. ACL - Private). These should all be set to private
I think there is room to work with SL and data providers to standardize the inputs we receive from them. For one, we can send them a "standard" data template which includes all the applicable data we need for the update KPDB build pipeline.For example:
This data repository should be deprecated in favor of using
db-data-library
. I think one of the reasonings behind creating the data repo originally was that data library wasn't fully functional (an assumption) and the data was meant to be kept private as it contains sensitive information not available to the public.Generally, all the data repo does is process either csv files, xlsx files, or geospatial files into SQL files. Data Library has the functionality do this and also apply some standards we use when we archive input/source data.
raw
folder where we constantly change the dates of things, have multiple files with similar data etc...dcp_n_study
uidhpd_rfp