Closed djay closed 3 years ago
any ideas on this @pmdscully ?
@djay What's the duration of the CFR?
I can't see how can that be negative...
0.0-1.0
>1.0
>=0
Ah. so duration of CFR
is not going to lead to a negative output result. But it's still worth verifying to confirm the output result deaths
are as expected.
I'd probably run a few isolated tests on dates with known CFR, cases and deaths
to verify nothing funny is happening.
(btw - all the above should apply to separated age ranges and/or the full age range of deaths).
- the dates of cases used in their CFR calculation is not the same as found in covid-19
I think you mean deaths by heart disease and covid
(for example), then this would increase deaths, and would not cause negative numbers. In this case, CFR would be more likely to increase above 1.0.
- the lack of precision of the CFR number throws things out
This would only cause difference error (i.e. off-by-1
/off-by-x
).
- the cases with missing ages in covid-19 dataset throws things out (maybe gets too large on certain days?)
This would cause the CFR rate to converge towards 0.0, not negative.
I can't see how can that be negative...
The figures are cumulative. In the conversion from cumulative to daily, if cumulative isn't increasing you get negatives. The cumulative CFR reported sometimes decreases for a given age group, which is ok as long as the cases for that age also increase.
I think you mean deaths by heart disease and covid (for example), then this would increase deaths, and would not cause negative numbers. In this case, CFR would be more likely to increase above 1.0.
No. I mean that to turn cum CFR into cum deaths I have to get a figure for cum cases. Perhaps the cum cases they are using have a different cut of date than the ones I'm using, or are using date of infections instead of date of confirmation.
@pmdscully look at "W3 CFR 15-39" on 29th. https://raw.githubusercontent.com/wiki/djay/covidthailand/situation_reports.csv
I'd probably run a few isolated tests on
dates with known CFR, cases and deaths
to verify nothing funny is happening.
I can't think of anywhere that had the cumulative deaths since 1st april for the same age groups. did you see this somewhere? or even cumulative cases per age group.
No. I mean that to turn cum CFR into cum deaths I have to get a figure for cum cases. Perhaps the cum cases they are using have a different cut-off date than the ones I'm using, or are using date of infections instead of date of confirmation.
There doesn't appear to be an alternative workaround to knowing:
cum cases date range
?cum deaths date range
? I can't think of anywhere that had the cumulative deaths since 1st april for the same age groups. did you see this somewhere? or even cumulative cases per age group.
Any available data on deaths with ages?... Only from the briefings...
@pmdscully the only data on deaths with specific ages is before april. or it its just median min and max.
@djay
@pmdscully the only data on deaths with specific ages is before april. or it its just median min and max.
It's possible to remodel the statistical distribution with median, min and max
; however, we have to makes some pretty big assumptions; i.e. as follows:
normal distribution
, randomly generate
n
points between min
and max
, to estimate the death ages of the day i.e. [scipy.stats.norm] or [a function in numpy].
mean == median
, which is rarely accurate.generated distribution
has a median
approximately equal to the known median.Then we have a modelled distribution of deaths for each day.
If the alternatives are:
CFR back-tracking without known date ranges
, vs remodel the known daily deaths distribution from median, min, max
.Where both have missing information, then I'd probably go with remodelling, because at least we can verify using a single data point (median), instead of zero data points.
@djay
As we don't have the complete death age data or which numerical distribution type would best match Thailand's age of COVID-19 death distribution
, it's still a moot point to decide which statistical distribution to use to generate (estimate) the data. In this case I mentioned Normal
(Gaussian), but as we've seen from that COVID-19 IFR modelling (i.e. for UK), it's clearly not Normal
..
So, we still cannot avoid this issue.
Getting closer with rolling averages on both sets of numbers. Currently issue in mid june where 15-39 cases drop by half for 1 day Either
now published
I'm now collecting cumulative CFR for 3 age ranges (since 1st april) from situation reports.
However these cumulative CFR values can sometimes decrease (see "W3 CFR 15-39" on 29th). CFR can go down if cases increases but the cumulative cases I am using don't seem to match rise at the right times.
So turning this into daily deaths by age is turning out to be a bit tricky as so far I end up with negative deaths on certain days (https://github.com/djay/covidthailand/blob/1f4f4fb98efce2b8e1948b8383b6b23d2aeb9cfb/covid_data.py#L488) Either
Need to work out how to correct for these to get reasonable numbers that align with min, max and median of reported deaths.