Closed jm-rivera closed 1 year ago
Is it possible to do a combo/all three of the following groups:
I think some version of all three would be most helpful, but I'm not sure what the data allows for, are there enough countries with available data for this.
As for the grouping, can you explain to me again what the drawbacks of using median values are? And what is best practice for an analysis like this, if there are any? I'm not sure I fully understand the pros/cons of doing something like this is?
Is it possible to do a combo/all three of the following groups:
- Africa (total)
- country level for all African countries
- Income group
I think some version of all three would be most helpful, but I'm not sure what the data allows for, are there enough countries with available data for this.
Yes I think that should be possible indeed. though see my next comment about some of the groups.
As for the grouping, can you explain to me again what the drawbacks of using median values are? And what is best practice for an analysis like this, if there are any? I'm not sure I fully understand the pros/cons of doing something like this is?
The main issue with groups is one of missing data.
When looking at time series data, you want to make sure that the changes you observe over time are indeed true in the underlying data, and not created by things like missing data.
When looking at an individual country, it's easier to deal with things like that: either a data point is missing or it isn't. So you either show it or you don't (or you impute it somehow).
However, it is trickier when looking at groups. For example, your group may have 20 countries in it. And it may be that data for all 20 countries is available in several years. But it may also be that in other years, data is only available for (for example) 15 countries or so, and the 5 missing countries aren't the same every year. If we were to just add the 'available data' for the group and present it as totals for that group for each year, the danger would be that the amounts could be higher or lower simply because we sometimes have data for more or less countries.
There are a few strategies to get around that, but all of them have tradeoffs. Two main to consider:
Ultimately part of the decision is down to how much data is missing. If the time series data for each country is mostly complete, and there is data for almost all members of each group, then producing totals is definitely viable (and 'total' share of gdp or 'total' per capita figures are also possible), even if we have to do a little bit of interpolation. But if there are a lot of gaps in the time series data for each country, or if we only have data for some countries in a group, then it isn't the most methodologically sound thing to do to frame such numbers as actual totals (since so much of it would be either subject to imputations or simply missing)
Okay I think I understand. I'm familiar with interpolating/imputing missing data and your explanation makes sense with how we would use both interpolation and central tendencies.
A few follow-up questions so I make sure I fully understand:
I think this page would mostly rely on health spending as a % of GDP or per capita, and maybe one or two data points in the key numbers section on total spending on health at a global level, if it's possible. In this case, I think going with the median values would make the most sense as we would be focusing mostly on % of GDP or per capita spending
For your questions:
Got it, thanks Jorge, this was extremely helpful for me to understand the different options. Based on what you've explained, I'm fine to use the median, unless by some surprise we discover there is really good data coverage!
There are a variety of ways to group/organise and present the data.
Should our visualisations be:
What would be most helpful?
If grouped, are we happy to use group median values?
If grouped, we would need to 'stabilise' groups somehow (in case the time series have gaps). This could be done by: