ONEcampaign / topic_health_financing

A repository for the Health Financing topic page
MIT License
0 stars 0 forks source link

Discussion: How should we present the data #8

Closed jm-rivera closed 1 year ago

jm-rivera commented 1 year ago

There are a variety of ways to group/organise and present the data.

Should our visualisations be:

What would be most helpful?

If grouped, are we happy to use group median values?

If grouped, we would need to 'stabilise' groups somehow (in case the time series have gaps). This could be done by:

nupur-parikh commented 1 year ago

Is it possible to do a combo/all three of the following groups:

  1. Africa (total)
  2. country level for all African countries
  3. Income group

I think some version of all three would be most helpful, but I'm not sure what the data allows for, are there enough countries with available data for this.

nupur-parikh commented 1 year ago

As for the grouping, can you explain to me again what the drawbacks of using median values are? And what is best practice for an analysis like this, if there are any? I'm not sure I fully understand the pros/cons of doing something like this is?

jm-rivera commented 1 year ago

Is it possible to do a combo/all three of the following groups:

  1. Africa (total)
  2. country level for all African countries
  3. Income group

I think some version of all three would be most helpful, but I'm not sure what the data allows for, are there enough countries with available data for this.

Yes I think that should be possible indeed. though see my next comment about some of the groups.

jm-rivera commented 1 year ago

As for the grouping, can you explain to me again what the drawbacks of using median values are? And what is best practice for an analysis like this, if there are any? I'm not sure I fully understand the pros/cons of doing something like this is?

The main issue with groups is one of missing data.

When looking at time series data, you want to make sure that the changes you observe over time are indeed true in the underlying data, and not created by things like missing data.

When looking at an individual country, it's easier to deal with things like that: either a data point is missing or it isn't. So you either show it or you don't (or you impute it somehow).

However, it is trickier when looking at groups. For example, your group may have 20 countries in it. And it may be that data for all 20 countries is available in several years. But it may also be that in other years, data is only available for (for example) 15 countries or so, and the 5 missing countries aren't the same every year. If we were to just add the 'available data' for the group and present it as totals for that group for each year, the danger would be that the amounts could be higher or lower simply because we sometimes have data for more or less countries.

There are a few strategies to get around that, but all of them have tradeoffs. Two main to consider:

Ultimately part of the decision is down to how much data is missing. If the time series data for each country is mostly complete, and there is data for almost all members of each group, then producing totals is definitely viable (and 'total' share of gdp or 'total' per capita figures are also possible), even if we have to do a little bit of interpolation. But if there are a lot of gaps in the time series data for each country, or if we only have data for some countries in a group, then it isn't the most methodologically sound thing to do to frame such numbers as actual totals (since so much of it would be either subject to imputations or simply missing)

nupur-parikh commented 1 year ago

Okay I think I understand. I'm familiar with interpolating/imputing missing data and your explanation makes sense with how we would use both interpolation and central tendencies.

A few follow-up questions so I make sure I fully understand:

  1. If we want to look at how much countries spend in total, we would use the interpolating method for anything that is missing?
  2. If we want to look at how much countries spend as % of GDP or per capita, we would use the median as the measure of central tendency?
  3. If we want to look at (for example) Country X's spending on different diseases or services over time, as a % of health expenditure, we would use the median? And if we wanted to look at this in total USD, we would have to interpolate any missing data instead?

I think this page would mostly rely on health spending as a % of GDP or per capita, and maybe one or two data points in the key numbers section on total spending on health at a global level, if it's possible. In this case, I think going with the median values would make the most sense as we would be focusing mostly on % of GDP or per capita spending

jm-rivera commented 1 year ago

For your questions:

  1. If looking at individual countries, I suggest no interpolation. Show the data that is there only. If looking at groups of countries, interpolate missing data (within some reasonable levels) to avoid introducing noise/fluctuations based on data availability
  2. If looking at individual countries, use the actual value. If looking at groups of countries, use the median (unless we discover there is really good data coverage for all the groups we want to show (which I very much doubt)
  3. If an individual country, the actual data. If a group, then the median for that group - interpolating as needed/possible.
nupur-parikh commented 1 year ago

Got it, thanks Jorge, this was extremely helpful for me to understand the different options. Based on what you've explained, I'm fine to use the median, unless by some surprise we discover there is really good data coverage!