axisbits / covid-api

API based on publicly available data by Johns Hopkins CSSE
20 stars 0 forks source link

Produce calculated "death_rate" #4

Closed impredicative closed 4 years ago

impredicative commented 4 years ago

Thank you again for the work so far. I'm using and sharing the data. I request the production and inclusion of a mortality ratio.

Algorithm

if confirmed > 0:
    mortality = deaths / confirmed
else:
    mortality = NaN

If NaN cannot be specified for some reason, then a fallback value is 0. A null value should be avoided as this will harm downstream formatting.

Examples

confirmed = 12345 deaths = 123 mortality = 123 / 12345 = 0.009963547995139732

confirmed = 12 deaths = 0 mortality = 0 / 12 = 0

confirmed = 0 deaths = 0 mortality = NaN


Thank you.

elitemaks commented 4 years ago

@impredicative done.

ezanardi commented 4 years ago

That ratio (deaths/confirmed) is the Case Fatality Rate (CFR) https://en.wikipedia.org/wiki/Case_fatality_rate . The mortality rate is (roughly) deaths / population . https://en.wikipedia.org/wiki/Mortality_rate

impredicative commented 4 years ago

@ezanardi Are you okay with having mortality renamed to fatality_rate?

elitemaks commented 4 years ago

This data is related only to COVID statistics and deaths, in particular, so I think it's clear for everyone that mortality here related to covid cases only. It's just a widely used word. But if it's not ok, we'll change it, no problem. Let's discuss this.

impredicative commented 4 years ago

@elitemaks As an aside, I see that mortality is currently a string. It should definitely be a number instead, not a string. The string is going to produce problems formatting it as a percentage.

elitemaks commented 4 years ago

@impredicative right, thank you. Fixed.

ezanardi commented 4 years ago

IMHO, fatality_rate is a better name than mortality. These days I've seen seasonal flu mortality rate data compared to COVID-19 CFR data in the media, I guess it's easy to confuse them. Even wikipedia warns against confusing them. OTOH, I don't want to ask you to spend a lot of time fixing this minor vocabulary issue.

impredicative commented 4 years ago

@elitemaks I think we are good to go for the rename. Thank you kindly.

impredicative commented 4 years ago

This issue is no longer critically important for me because I've updated my processing code to be able to compute various statistics including this one. This issue may still help other users, however.

elitemaks commented 4 years ago

Done.

impredicative commented 4 years ago

The change log in the readme can perhaps be updated. Thank you.

elitemaks commented 4 years ago

@impredicative thank you for pointing out! Done. If you have any other requests or suggestions please let us know.

impredicative commented 4 years ago

That ratio (deaths/confirmed) is the Case Fatality Rate (CFR) https://en.wikipedia.org/wiki/Case_fatality_rate .

Upon closer examination, this formula for the CFR is highly debatable. Someone else pointed this out to me today and it caught me by surprise. Specifically, it is debatable whether the active case count should be included in it, considering that the active cases are not concluded yet, and some of them are going to die.

Someone can definitely reasonably ask for the formula deaths / (deaths + recovered) to be used for the CFR instead, as only these are the concluded cases.

To avoid ambiguity, I suggest the deaths / confirmed parameter be renamed to death_rate.

elitemaks commented 4 years ago

@impredicative I'm not sure we need to do renaming again. As per the mentioned Wikipedia article, it's correct - CFR - "is the proportion of deaths from a certain disease compared to the total number of people diagnosed with the disease for a certain period of time". So, a certain period of time is from the start of the outbreak, until the report date. Of course, the final and true CFR we will know when the number of active cases will be 0, but since then we have what we have.