Need to Create a Data Dictionary

RaedMan commented 7 years ago

Building upon the meeting with Eric and @geneorama, as a reminder to publish this when completed.

geneorama commented 7 years ago

I want to clarify what was being asked here.

Typically we use the term "data dictionary" to refer to a tabular summary of the fields available in a table, or group of tables.

What I needed, and what we developed, is a tabular summary of the data sets and files that are currently being used and currently available on the analytics server. The problem was that we had source data in different places than normal with obscure names, and it was impossible to determine the file's origins (source and date) based on the file name alone

Personally I'd prefer to move to a system of keeping a verbose but consistent file name in the directory structure so that we can preserve original source files exactly as we received them, including their original file name, and simulatneously be able to identify the files (but he directory structure). So, personally I would like something along these lines:

data/raw/<source>/<date>/<original file name>
data/processed/<source>/<friendly file name>

However it's not really my place to make modifications like that since we're working downstream of the DSSG project.

geneorama commented 7 years ago

I pushed up a Data Inventory file.

Chicago / lead-public

Need to Create a Data Dictionary #9