NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

Number of digits shown after decimal is inconsistent #287

Closed TylerMatteo closed 2 years ago

TylerMatteo commented 2 years ago

This issue documents a handful of apparent inconsistencies where the output data don't have the correct number of digits after the decimal place. I'm going to identify columns and file names as these issues generally don't correlate to any particular geography or geoid.

Background

This investigation came from OSE researching how to address bug 7946 which states that numbers for "median age" are showing with 0 digits after the decimal when it should show up with 1. Currently, OSE's code doesn't attempt to round any numbers to a decimal place but it does show data points with a "measure" token other than "pct" as integers, hence why the median age numbers are getting cut off despite having the correct number of digits after the decimal in the output CSVs.

OSE can change our code to just show data exactly as they appear in the CSVs but in doing so, I noticed we can't always make that assumption. Here are some cases where the numbers in the CSVs don't have the number of digits after the decimal that I would expect. I know many of these may be columns that are essentially just "passed through" from data processed by Population but just wanted to capture a list here so that we can start on a solution:

Let me know if you need any more examples, hopefully this explanation makes things a little clearer. As far as we know, there are no hard and fast rules for saying definitively how many digits after the decimal a given column should have but it seems like:

TylerMatteo commented 2 years ago

Hey all, wanted to check if you have a sense of how much work this issue would be to fix on the data side. I've given it some more thought and I think there might be a solution we can implement on the OSE side that wouldn't be too much work by taking advantage of some assumptions we can make based on measures and variances. Happy to discuss whenever.

SashaWeinstein commented 2 years ago

Hey Tyler we just talked about this as a group. We haven't checked which data points have the wrong number of decimals places and why. We don't know if the issue is with source data or comes from our code casting the data in an incorrect way. My schedule is open between now and 5 and before 2 tomorrow to chat

TylerMatteo commented 2 years ago

You can hold off on doing any more investigation into this for now. I'm going to work with Sam and Erica to check some of the assumptions I'm working off of and then we can go from there.

SashaWeinstein commented 2 years ago

Wonderful, let me know if there are next steps for DE

TylerMatteo commented 2 years ago

Closing this. I ended up just adding a bit of logic to our pipeline to enforce consistency based on indicators, measure, and variance.