datadesk / census-data-aggregator

Combine U.S. census data responsibly
MIT License
42 stars 9 forks source link

Correct handling of jam values in median approximation #17

Open sastoudt opened 5 years ago

sastoudt commented 5 years ago

Thanks to some clarification from our Census friends:

The jam value represents a result from a median calculation when the median can't actually be calculated because it lies in the lowest or highest bin. The jam value is not used in the median calculation itself as a lower or upper bound for the end bins.

This information doesn't impact the calculations of the examples we have now (we've treated the jam value as a bound), but we need to update the median function to handle the scenario where the lower and upper bins don't have concrete bounds (plus add examples of this scenario).

We may want to include an optional input jam_value to use in the case that the median occurs in the highest/lowest bin.

sastoudt commented 5 years ago

proof of concept here

palewire commented 5 years ago

I think that goes right here

palewire commented 5 years ago

Might be good to throw a warning when this happens too.

sastoudt commented 5 years ago

dealt with in https://github.com/datadesk/census-data-aggregator/pull/20