NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

Update tokens and file structure - WIP #213

Open AmandaDoyle opened 2 years ago

AmandaDoyle commented 2 years ago

Requested OSE changes:

Level of effort

Change values in the following files:

Reorganize files to consistently communicate years

Rules:

Option 1: Consistently communicate year in field name

Changes:

Option 2: Consistently communicate year in file name

Changes: - QOL and Opportunity Split out health outcome indicators into separate files for each of the following time periods:

mbh329 commented 2 years ago

I think the path of least resistance is probably just adding the year to the field name but would that be a cumbersome lift on the data dictionary side?

AmandaDoyle commented 2 years ago

I think the path of least resistance is probably just adding the year to the field name but would that be a cumbersome lift on the data dictionary side?

Noted. I'm not worried about the data dictionary. I am concerned about shoving 3000+ fields into a single file though.

td928 commented 2 years ago

I think the path of least resistance is probably just adding the year to the field name but would that be a cumbersome lift on the data dictionary side?

Noted. I'm not worried about the data dictionary. I am concerned about shoving 3000+ fields into a single file though.

For option 1, I agree with Amanda here. We were very hopeful about being able to automate a lot of testing and I think this would certainly add some challenges to that with that front but probably also opportunity -- since manually checking 3000 columns just seem unrealistic now.

Option two on the other hand, would requires new work on exporting files with year, where the year or year range needs to be passed as an additional parameter. As organizationally this seems messy, but it might be overall easier work and easier to manually review the results. But my question is about whether the OSE would accept this format because it does seem quite different from the results from option 1 in my mind.

SashaWeinstein commented 2 years ago

I agree with what's been voiced that avoiding 3k column tables is best if possible