Closed TylerMatteo closed 2 years ago
Hey all, wanted to check if you have a sense of how much work this issue would be to fix on the data side. I've given it some more thought and I think there might be a solution we can implement on the OSE side that wouldn't be too much work by taking advantage of some assumptions we can make based on measures and variances. Happy to discuss whenever.
Hey Tyler we just talked about this as a group. We haven't checked which data points have the wrong number of decimals places and why. We don't know if the issue is with source data or comes from our code casting the data in an incorrect way. My schedule is open between now and 5 and before 2 tomorrow to chat
You can hold off on doing any more investigation into this for now. I'm going to work with Sam and Erica to check some of the assumptions I'm working off of and then we can go from there.
Wonderful, let me know if there are next steps for DE
Closing this. I ended up just adding a bit of logic to our pipeline to enforce consistency based on indicators, measure, and variance.
This issue documents a handful of apparent inconsistencies where the output data don't have the correct number of digits after the decimal place. I'm going to identify columns and file names as these issues generally don't correlate to any particular geography or geoid.
Background
This investigation came from OSE researching how to address bug 7946 which states that numbers for "median age" are showing with 0 digits after the decimal when it should show up with 1. Currently, OSE's code doesn't attempt to round any numbers to a decimal place but it does show data points with a "measure" token other than "pct" as integers, hence why the median age numbers are getting cut off despite having the correct number of digits after the decimal in the output CSVs.
OSE can change our code to just show data exactly as they appear in the CSVs but in doing so, I noticed we can't always make that assumption. Here are some cases where the numbers in the CSVs don't have the number of digits after the decimal that I would expect. I know many of these may be columns that are essentially just "passed through" from data processed by Population but just wanted to capture a list here so that we can start on a solution:
100
instead of100.0
. The requirements OSE received say that all percents (and percent MOEs) should have a digit after the decimal, even if it's just.0
. Our current front end code actually meets that requirement because we "manually" make sure percents have 1 digit but if we're going to switch to showing numbers exactly as they appear in the source data, we can no longer take that approach.industry_mnfct_wages_wnh_median
as 1 butindustry_mnfct_wages_hsp_median
has 0. I believe these should all have 0. The pattern here seems to be that columns with any empty cells have 1 digit after decimal, presumably because they're getting coerced into floats at some point?Let me know if you need any more examples, hopefully this explanation makes things a little clearer. As far as we know, there are no hard and fast rules for saying definitively how many digits after the decimal a given column should have but it seems like: