djay / covidthailand

Thailand Covid testing and case data gathered and combined from various sources for others to download or view
126 stars 15 forks source link

Get better data on death ages. #34

Closed djay closed 3 years ago

djay commented 3 years ago

I'm now collecting cumulative CFR for 3 age ranges (since 1st april) from situation reports.

Screen Shot 2021-07-05 at 1 51 13 pm

However these cumulative CFR values can sometimes decrease (see "W3 CFR 15-39" on 29th). CFR can go down if cases increases but the cumulative cases I am using don't seem to match rise at the right times.

So turning this into daily deaths by age is turning out to be a bit tricky as so far I end up with negative deaths on certain days (https://github.com/djay/covidthailand/blob/1f4f4fb98efce2b8e1948b8383b6b23d2aeb9cfb/covid_data.py#L488) Either

Need to work out how to correct for these to get reasonable numbers that align with min, max and median of reported deaths.

djay commented 3 years ago

any ideas on this @pmdscully ?

pmdscully commented 3 years ago

@djay What's the duration of the CFR?

pmdscully commented 3 years ago

https://github.com/djay/covidthailand/blob/1f4f4fb98efce2b8e1948b8383b6b23d2aeb9cfb/covid_data.py#L488

I can't see how can that be negative...

Ah. so duration of CFR is not going to lead to a negative output result. But it's still worth verifying to confirm the output result deaths are as expected.

I'd probably run a few isolated tests on dates with known CFR, cases and deaths to verify nothing funny is happening.

(btw - all the above should apply to separated age ranges and/or the full age range of deaths).

pmdscully commented 3 years ago

  • the dates of cases used in their CFR calculation is not the same as found in covid-19

I think you mean deaths by heart disease and covid (for example), then this would increase deaths, and would not cause negative numbers. In this case, CFR would be more likely to increase above 1.0.

  • the lack of precision of the CFR number throws things out

This would only cause difference error (i.e. off-by-1/off-by-x).

  • the cases with missing ages in covid-19 dataset throws things out (maybe gets too large on certain days?)

This would cause the CFR rate to converge towards 0.0, not negative.

djay commented 3 years ago

I can't see how can that be negative...

The figures are cumulative. In the conversion from cumulative to daily, if cumulative isn't increasing you get negatives. The cumulative CFR reported sometimes decreases for a given age group, which is ok as long as the cases for that age also increase.

I think you mean deaths by heart disease and covid (for example), then this would increase deaths, and would not cause negative numbers. In this case, CFR would be more likely to increase above 1.0.

No. I mean that to turn cum CFR into cum deaths I have to get a figure for cum cases. Perhaps the cum cases they are using have a different cut of date than the ones I'm using, or are using date of infections instead of date of confirmation.

djay commented 3 years ago

@pmdscully look at "W3 CFR 15-39" on 29th. https://raw.githubusercontent.com/wiki/djay/covidthailand/situation_reports.csv

djay commented 3 years ago

I'd probably run a few isolated tests on dates with known CFR, cases and deaths to verify nothing funny is happening.

I can't think of anywhere that had the cumulative deaths since 1st april for the same age groups. did you see this somewhere? or even cumulative cases per age group.

pmdscully commented 3 years ago

No. I mean that to turn cum CFR into cum deaths I have to get a figure for cum cases. Perhaps the cum cases they are using have a different cut-off date than the ones I'm using, or are using date of infections instead of date of confirmation.

There doesn't appear to be an alternative workaround to knowing:

I can't think of anywhere that had the cumulative deaths since 1st april for the same age groups. did you see this somewhere? or even cumulative cases per age group.

Any available data on deaths with ages?... Only from the briefings...

djay commented 3 years ago

@pmdscully the only data on deaths with specific ages is before april. or it its just median min and max.

pmdscully commented 3 years ago

@djay

@pmdscully the only data on deaths with specific ages is before april. or it its just median min and max.

It's possible to remodel the statistical distribution with median, min and max; however, we have to makes some pretty big assumptions; i.e. as follows:

Then we have a modelled distribution of deaths for each day.


If the alternatives are:

Where both have missing information, then I'd probably go with remodelling, because at least we can verify using a single data point (median), instead of zero data points.

pmdscully commented 3 years ago

@djay
As we don't have the complete death age data or which numerical distribution type would best match Thailand's age of COVID-19 death distribution, it's still a moot point to decide which statistical distribution to use to generate (estimate) the data. In this case I mentioned Normal (Gaussian), but as we've seen from that COVID-19 IFR modelling (i.e. for UK), it's clearly not Normal..

So, we still cannot avoid this issue.

djay commented 3 years ago

Getting closer with rolling averages on both sets of numbers. Currently issue in mid june where 15-39 cases drop by half for 1 day Either

djay commented 3 years ago

now published