adamaltmejd / covid

MIT License
88 stars 14 forks source link

Is similar data available for other countries (number of deaths by death date)? #29

Open PierreMesure opened 3 years ago

PierreMesure commented 3 years ago

Hej Adam,

Thanks for your awesome work, it's still one of the few graphs I browse regularly to keep me updated of the COVID situation in Sweden.

I find really shameful that the reporting delay is so bad and that it's gotten worse during this second wave. But I actually have no certainty that it's better in other countries. I've been looking for datasets that would enable me to determine if that's the case, without much success so far. Have you found any?

adamaltmejd commented 3 years ago

Long time since I looked, but back then I couldn't find it anywhere. I most definitely share your concern though. If you find any please do post it here.

PierreMesure commented 3 years ago

I asked on a Nordic datajournalist and was told that Finland started publishing deaths by death date last fall. I haven't been able to find the raw data yet but here is an example: https://sampo.thl.fi/pivot/prod/en/epirapo/covid19case/fact_epirapo_covid19case?column=measure-492118&row=dateweek20200101-508804L

I guess by getting the raw data for each publishing day, we could recreate the same data you use from FHM.

adamaltmejd commented 3 years ago

Thanks! Yes indeed! It seems like their API creates csv links like this: https://sampo.thl.fi/pivot/prod/sv/epirapo/covid19case/fact_epirapo_covid19case.csv?row=dateweek20200101-508804L&column=measure-492118&

Should be able to set up a downloader and host the data on this site. Very nice. Think I'll have some time to implement it tonight.

PierreMesure commented 3 years ago

Great, I can try and write them an FOIA request to see if they have some older files. EDIT: sent.

adamaltmejd commented 3 years ago

That would be awesome. Starting to download now would mean one could only see the extent of the reporting delay from today onwards.

PierreMesure commented 3 years ago

Hi @juhanisaa, I see that you have collected a lot of Finnish COVID data and are explaining it here.

I am tagging you here because we are currently looking for older versions of the list of deaths by death date published through THL's API (this call). Is this something that you might have saved on your servers? I had a look but couldn't find it.

Thanks in advance.

juhanisaa commented 3 years ago

I believe we haven't stored that data, but I'll have to confirm.

PierreMesure commented 3 years ago

Thanks @juhanisaa! Don't hesitate to tell us if you get your hands on it!

@adamaltmejd, I found another country really interesting to compare with! The UK has death data by death date!

adamaltmejd commented 3 years ago

Very nice! Thanks. I'll try to find some time to put together crawlers. Sorry for not doing it yet.

PierreMesure commented 3 years ago

No worries, I'll start downloading them manually this week so we don't lose more data. I'm also sending an FOIA request to the British authorities to try to get the old files. From what I could see, some months are available on Github.

adamaltmejd commented 3 years ago

Very interesting! Keep me posted :)

adamaltmejd commented 3 years ago

Ok I've just set up the code to download data from the finish and UK repos. Lets see if it works :)

PierreMesure commented 3 years ago

In this repo, we have bigger CSV files containing deaths by date of death among other things. Dating back to the 13th of October. Thanks @theosanderson!

Here is a repo having the period 23/08 -> 30/11. Thanks @nathanrawle!

On this repo, there is a file named death_data.csv updated everyday since the 26th of October. Thanks @rvaughan!

Finally, in this repo, the same data is present since the 8th of December. Thanks @msleigh!

Note that the first one distinguishes by county and the other two by nation. Maybe we want to download this file in the future? In any case, it seems to be fetched automatically on the first aforementioned repo.

theosanderson commented 3 years ago

For the UK you can download data going back further with Archive in https://coronavirus.data.gov.uk/details/download (didn't exist when I made my repo)

nathanrawle commented 3 years ago

In this repo, we have bigger CSV files containing deaths by date of death among other things. Dating back to the 13th of October. Thanks @theosanderson!

Here is a repo having the period 23/08 -> 30/11. Thanks @nathanrawle!

On this repo, there is a file named death_data.csv updated everyday since the 26th of October. Thanks @rvaughan!

Finally, in this repo, the same data is present since the 8th of December. Thanks @msleigh!

Note that the first one distinguishes by county and the other two by nation. Maybe we want to download this file in the future? In any case, it seems to be fetched automatically on the first aforementioned repo.

No problem. I obtained the data from the API @theosanderson mentioned, from which you can access whichever metrics you want as they were published on x date in the past backdated up to 23 August. Releases from 31 Nov up to yesterday can be scraped in the same way now.

adamaltmejd commented 3 years ago

Fantastic, thanks everyone! I'll put together a dataset with daily releases to measure reporting delay and to evaluate our model on. Exciting :)

nathanrawle commented 3 years ago

Be aware that the release for 7/10/2020 is missing from https://coronavirus.data.gov.uk/details/download and will return HTTP200 with no content

PierreMesure commented 3 years ago

Hej @adamaltmejd, would you like some help to convert the new data to the same format you feed the current graphs? I don't have any experience with R but it shouldn't be too hard to build on your code. I just need to get a dev env running.

After getting that working and maybe a graph comparing Sweden's delay with the others, I thought it could be interesting to write a blog post about the findings.

adamaltmejd commented 3 years ago

Feel free to explore it if you want! I won't have time to do anything for a week or two.

adamaltmejd commented 3 years ago

Made a version of my graph for the UK. Can be seen here: https://adamaltmejd.se/covid/deaths_lag_uk.png

PierreMesure commented 3 years ago

Awesome! I actually played with it myself but I got so many small bugs with R trying to recompile the delta-t for the data since last Summer, I gave up at some point.

PierreMesure commented 3 years ago

What's your early analysis? It seems like the British data has some interesting constants (no same day data, no data on Sundays nor on public holidays) that are similar to the Swedish one.

But besides that, there is just so much less blue on the UK's graph, they seem to be reporting the deaths many times faster and the ones over 14 days late are anecdotical.

PierreMesure commented 3 years ago

It's impossible to know the causes for such a difference at that point. Difference in death confirmation method? Different delays in reporting? Priority given to accuracy versus speed?

But it would be interesting to discuss it with journalists and see if they can investigate and maybe question FHM about it.

adamaltmejd commented 3 years ago

Agreed it is super interesting. Agree with your observations too. My bet on the main reason for the big delays in Sweden is that we have a system in place already for death reporting at the national level - and that system has been used also for Covid. The problem is that it wasn't designed to be fast. The doctor who signs the death certificate has something like two weeks to send it in. So what has always worked well now has a speed problem that is not easy to fix.

PierreMesure commented 3 years ago

That makes sense although for this as for much of the Swedish government with COVID, it's hard to justify that when other countries were able to do better.

How close would you say you are from generating the other graphs (reporting delay) and some with the Finnish data? Should we wait before sending this to journalists?

I think it would be great to send that to the data team at DN which has a graph similar to yours with deaths by death date. They would be able to double-check the data and code. Emanuel Karlsten would also probably be interested. Do you have time to do it? I could write a draft if you want.

adamaltmejd commented 3 years ago

Adding Finland is easy now, but the issue is that we do not have data going back in time and I haven't been collecting for long. Or did you manage to get archived data?

adamaltmejd commented 3 years ago

Seems also there is a bug with the finish data, for some reason its stopped collecting deaths and only collect cases for the last five days. https://github.com/adamaltmejd/covid/commit/c26de6e07f7937844b349749bd8282ec7b80023d

Trying to fix now...

PierreMesure commented 3 years ago

Last time I checked, I couldn't find any older data. And the agency didn't save it either đŸ€ŠđŸ»â€â™‚ïž.

Maybe we should focus on the UK for now.

adamaltmejd commented 3 years ago

No idea why but seems we lost 6 days of downloads... Really unfortunate.

PierreMesure commented 3 years ago

Here is a proposal for an e-mail to journalists:

Hej,

Jag kontaktar er eftersom jag tror att vi har upptÀckt nÄgot som kan vara vÀrt ert intresse angÄende COVID-19 och hur pandemin hanteras av regeringen.

Under det senaste Äret har Adam Altmejd, forskare pÄ handelshögskolan i Stockholm, sammanstÀllt Sveriges dödsfall efter dödsdatum och publicerat visualiseringar som visar hur lÀnge det tar för dödsfall att rapporteras. De ligger pÄ adamaltmejd.se/covid och kÀllkoden som genererar och uppdaterar visualiseringarna finns pÄ github.com/adamaltmejd/covid. Allt Àr baserat pÄ öppna data frÄn FolkhÀlsomyndigheten.

Dödsfall rapporteras ofta nÄgra dagar sent och under andra vÄgen har det ökat mycket, en majoritet rapporterades över 7 eller 14 dagar sent.
Vi tyckte att det var konstigt sÄ vi letade efter andra lÀnders data för att kunna jÀmföra. TyvÀrr publicerar vÀldigt fÄ lÀnder dödsfall efter dödsdatum men vi hittade tvÄ: Storbritannien och Finland.

HÀr Àr visualiseringarna för Sverige och Storbritannien bredvid varandra. Som ni kan se Àr skillnaden mycket stor. Det tar knappt nÄgra dagar och nÀstan aldrig över 7 dagar i Storbritannien.

Vi kan inte veta varför skillnaden Àr sÄ stor och dÀrför kontaktar vi er som Àr professionella journalister. Om ni tycker att det Àr relevant hoppas vi att ni kan kolla det hÀr djupare och kanske stÀlla frÄgor kring det till de relevanta makthavarna.

Allt vÄrt arbete kring detta ligger pÄ Github.
adamaltmejd commented 3 years ago

Here is finland: https://adamaltmejd.se/covid/deaths_lag_finland.png, not much there because of the low total numbers but they are very slow!

Feel free to send an email like that, it sounds great! I don't have time to manage it so would prefer not to sign it, but would of course be excellent if more journalists noticed these differences.

PierreMesure commented 3 years ago

Here is finland: https://adamaltmejd.se/covid/deaths_lag_finland.png, not much there because of the low total numbers but they are very slow!

That's really interesting to see!

Feel free to send an email like that, it sounds great! I don't have time to manage it so would prefer not to sign it, but would of course be excellent if more journalists noticed these differences.

OK! Well, I won't remove your name since you're behind all this but I can send the e-mail and have you in cc. I don't have time to push it further either and we don't have any more knowledge anyway.

adamaltmejd commented 3 years ago

Thats what I meant! I'm really happy and appreciate you doing this, just don't want to be a signatory of the email cause I'm trying to put less not more time into Covid stuff :).

PierreMesure commented 3 years ago

I understand! I sent the e-mails to DN and Emanuel Karlsten, you're in cc. Let's see what they answer 🙂.

adamaltmejd commented 3 years ago

Very cool. Thanks for engaging with this!!