repo restructure - Githubissues

kbmorales commented 4 years ago

Proposing a system like:

bin/ for in production executable files (like reading in the PPP data)
docs/ for references, data dictionaries, manuals, etc.
data/ for raw data files that scripts rely on or that others would find useful--I think tidied data files should be uploaded to Google Drive to make it easier for others to use. Please cite sources in the README!
code/ folder with individual project subfolders on the CARES act data. Enhancements people make can go here. Contributions should be documented in the README. For example:
- code/NAICS/ is where scripts go for joining NAICS and PPP data
- code/project_name/ for the next project, etc.
tests/ for each project's tests

but open to any organizational structure or documentation scheme!

All finalized code should be able to be run on the output of the ppp data script.R file @JohnMcCambridge contributed, or on a dataframe as the result of reading in a CSV file of that data

kbmorales commented 4 years ago

Added in the beginnings of this structure to the repo and documented.

emigre459 commented 4 years ago

@kbmorales I've been tinkering around the edges getting some python code up and running for this too. Based on current repo structure, where would you like all of that to end up? I'm currently putting it in a root directory called python/ but can change as needed.

kbmorales commented 4 years ago

hey @emigre459 that's great to get some python code in the mix.

does your code read in the raw data, or is it some enhancement to it? if it reads in the raw data and therefore is a core function, bin/ is the place for it. if it helps develop the PPP data somehow, by cleaning it or adding on now variables, place it in code/ in a folder with a simple descriptive name for what it does.

feel free to hmu on slack as well

DataKind-DC / CARES

repo restructure #16