Metastring / HealthHeatMap

0 stars 0 forks source link

Storing Metadata on Indicators #24

Open asdofindia opened 4 years ago

asdofindia commented 4 years ago

Imagine an indicator like "Maternal Mortality Rate"

There are several attributes of this particular indicator that are useful. Some examples:

Now, there are certain other attributes that arise when an indicator is taken in relation with a source of data. These include:

There are certain other attributes which could be considered like

Examples of indicators

Previous work

Related reading:

rsprabha commented 4 years ago

Thanks for this discussion document. WHO seems to have the idea of Core Heatlth Indicators in the document that you sent. Maybe worthwhile to look at this as well and see how many are captured in our system. It may also be worth while to have other metadata on the indicators like, the range of values for the indicator and charecteristics of its distribution.

asdofindia commented 4 years ago

Indicators (and other dimensions) in real life exist independent of data.

For example, there could be the indicator "Happiness quotient". This could be not available in any datasets that we have.

Capturing such indicators require us to think whether we need to have a different index just to store all the possible and potential values of all the dimensions, regardless of whether there actually is data corresponding to those.

Another example is a composite indicator that is calculated on the fly. For example, imagine we create an indicator called "maternal health index" which is a sum of three other indicators that are present in the dataset. Now, either we can precompute this value for every possible combination of dimensions and store it or we can compute it on demand. If we are doing the latter, where would the maternal health index be stored?

asdofindia commented 4 years ago

Some preliminary thoughts on tackling some of the fields (numerator-denominator, etc)

Continuing from the comment on aggregation methods.

Essentially, things like "numerator", "denominator", "multiplying factor" are all part of how the value of an indicator is calculated. For rates, there will always be a numerator and denominator. The multiplying factor comes in when we want to describe something in "percentage" or "per thousand" where we multiply the rate by 100, 1000, etc respectively.

Now, we could imagine an indicator which is not strictly a rate. For example, imagine the maternal health index in the comment above. Maternal health index score could be defined as the result of the calculation 1 / ("maternal mortality rate" + "perinatal mortality rate" + "proportion of women receiving Iron Folic Acid"). Now, this can be expressed as numerator and denominator too. But the denominator here is not simple. It is some of three things.

A way, therefore, to consider, would be to capture a "formula" to calculate the value of an indicator.

For example, we could capture the formula of maternal mortality rate as

{
  "multiply": [
       { "divide": ["number of maternal deaths", "number of women in reproductive age group"] },
      100
   ]
}

We could similarly capture the formula for maternal health index as

{"divide": [1, {"sum": [ "maternal mortality rate", "perinatal mortality rate", "proportion of women receiving Iron Folic Acid"]}]

Similarly, I think, we can capture any metadata that maybe required for calculation of indicators.

Now, of course, the entities in these formulae have to be unique references to indicators (and not plain string like shown above)


How would we then calculate these values (for indicators whose value is not present in any of our sources, but the components which appear in their formula do have values in our dataset)?

We can use a scripted metric aggregation inside a bucket aggregation to calculate these values inside each bucket.

asdofindia commented 4 years ago

The other complicated thing to capture on an indicator is how it should get aggregated.

There are only so many ways values can get aggregated.