Disease section - Githubissues

lpicci96 commented 1 year ago

Adds module for disease specific financing

jm-rivera commented 1 year ago

This issue relates to #42, @nupur-parikh FYI

jm-rivera commented 1 year ago

Thanks @lpicci96, where can I see the final charts?

lpicci96 commented 1 year ago

Thanks @lpicci96, where can I see the final charts?

https://app.flourish.studio/visualisation/14182221/edit https://app.flourish.studio/visualisation/13994099/edit

jm-rivera commented 1 year ago

@lpicci96 I'm concerned about the missing data and our grouping strategy may not deal with this case in the best way.

The total_usd_spending function in common (L467) checks for data completeness assuming that there are gaps but that overall there's a good amount of data available. It doesn't do a cartesian product of the index + additional grouper... so if too much data is missing, it will sort of assume that there's never data for certain entities (countries/income groups/continents), so when assessing "completeness" it doesn't see it as missing since it doesn't expect it to be there in the first place.

To get around this we could:

a) improve the functions that go into _callable_by such that they create independent cartesian products of the index/grouper that take into account all the values that are actually possible. <-- this seems hard but it would probably be the most abstract/flexible solution long term
b) add a step before piping the data to total_usd_spending where we create a complete index of the countries, diseases are sources that are theoretically possible (for all countries, diseases, sources). That would create a much more sparse dataframe, which will mean the subsequent functions assess the level of missing data in a much better way. <-- I prefer this as it is a much easier implementation

The result of either approach will be that the output will have even more missing data (for country/income groupings), since less years/indicators will pass the threshold. That will make the results more consistent/coherent with our approach elsewhere... but may make some of the visualisations less appropriate/clear.

Thoughts? I realise the above may not be totally clear so happy to have a go at implementing or huddle to discuss further.

jm-rivera commented 1 year ago

I've also committed a couple of tweaks to section5.py for a bit clearer documentation

jm-rivera commented 1 year ago

I'm glad this worked (in terms of colouring only one side). However, I think it's the sources that should be colour differentiated (since there are only 3) and the diseases shouldn't. That would make it much easier to spot where the funding is going, and on the other side, from which source the money is coming.

lpicci96 commented 1 year ago

I checked the dataframe before aggregating for groups and it is already a sparse dataframe with all possible combinations of country-year-disease-source, with missing values filled as NaN, so the issue must be somewhere else

jm-rivera commented 1 year ago

It turns out it is happening here: https://github.com/ONEcampaign/topic_health_financing/blob/4141ab10e2dbacdd74092d2fbd2072c5b74d37d1/scripts/tools.py#L332-L342

The df is sparse until the fowardfill and then it isn't. I meant that by design because it's not a real issue elsewhere, but it turns out it is here because of the amount of missing data even after the fill.

Let me propose a tweak

jm-rivera commented 1 year ago

@lpicci96 @nupur-parikh... the problem is at 95% we get no data for Africa or any of the income groupings.

jm-rivera commented 1 year ago

I've just run the rest of the analysis and it doesn't make a difference to the chart we have online. But it does mean that for this section there are no country groupings. We could massively lower the threshold (elsewhere we seem to be using >50%)

With that, we get some data for section 5 chart 1:

Africa 2015, 2016, 2017,
LIC for 2016,2017,2018
LMIC for 2018

And for section5 chart 2:

Africa 2020 HIV AIDS/ Tuberculosis (both OOP), family planning (domestic gov)
LIC 2020 Maternal conditions OOP and a couple others

lpicci96 commented 1 year ago

I would be in favor of not having aggregates but @nupur-parikh I defer to you

nupur-parikh commented 1 year ago

Thanks both this is extremely helpful and thank you for all your work on this! Based on what I'm understanding from the above, even if we lower the threshold for the groupings we will still have incomplete data right? If so, I would also prefer we do not do aggregates as well for this section and instead just show individual countries given the limitations listed above and the already limited data we have for disease specific financing. I think this data at an individual country level would provide extremely useful to the sector as it's difficult to find/understand anywhere else.

ONEcampaign / topic_health_financing

Disease section #44