Open PierreMesure opened 3 years ago
Long time since I looked, but back then I couldn't find it anywhere. I most definitely share your concern though. If you find any please do post it here.
I asked on a Nordic datajournalist and was told that Finland started publishing deaths by death date last fall. I haven't been able to find the raw data yet but here is an example: https://sampo.thl.fi/pivot/prod/en/epirapo/covid19case/fact_epirapo_covid19case?column=measure-492118&row=dateweek20200101-508804L
I guess by getting the raw data for each publishing day, we could recreate the same data you use from FHM.
Thanks! Yes indeed! It seems like their API creates csv links like this: https://sampo.thl.fi/pivot/prod/sv/epirapo/covid19case/fact_epirapo_covid19case.csv?row=dateweek20200101-508804L&column=measure-492118&
Should be able to set up a downloader and host the data on this site. Very nice. Think I'll have some time to implement it tonight.
Great, I can try and write them an FOIA request to see if they have some older files. EDIT: sent.
That would be awesome. Starting to download now would mean one could only see the extent of the reporting delay from today onwards.
Hi @juhanisaa, I see that you have collected a lot of Finnish COVID data and are explaining it here.
I am tagging you here because we are currently looking for older versions of the list of deaths by death date published through THL's API (this call). Is this something that you might have saved on your servers? I had a look but couldn't find it.
Thanks in advance.
I believe we haven't stored that data, but I'll have to confirm.
Thanks @juhanisaa! Don't hesitate to tell us if you get your hands on it!
@adamaltmejd, I found another country really interesting to compare with! The UK has death data by death date!
Very nice! Thanks. I'll try to find some time to put together crawlers. Sorry for not doing it yet.
No worries, I'll start downloading them manually this week so we don't lose more data. I'm also sending an FOIA request to the British authorities to try to get the old files. From what I could see, some months are available on Github.
Very interesting! Keep me posted :)
Ok I've just set up the code to download data from the finish and UK repos. Lets see if it works :)
In this repo, we have bigger CSV files containing deaths by date of death among other things. Dating back to the 13th of October. Thanks @theosanderson!
Here is a repo having the period 23/08 -> 30/11. Thanks @nathanrawle!
On this repo, there is a file named death_data.csv updated everyday since the 26th of October. Thanks @rvaughan!
Finally, in this repo, the same data is present since the 8th of December. Thanks @msleigh!
Note that the first one distinguishes by county and the other two by nation. Maybe we want to download this file in the future? In any case, it seems to be fetched automatically on the first aforementioned repo.
For the UK you can download data going back further with Archive
in https://coronavirus.data.gov.uk/details/download (didn't exist when I made my repo)
In this repo, we have bigger CSV files containing deaths by date of death among other things. Dating back to the 13th of October. Thanks @theosanderson!
Here is a repo having the period 23/08 -> 30/11. Thanks @nathanrawle!
On this repo, there is a file named death_data.csv updated everyday since the 26th of October. Thanks @rvaughan!
Finally, in this repo, the same data is present since the 8th of December. Thanks @msleigh!
Note that the first one distinguishes by county and the other two by nation. Maybe we want to download this file in the future? In any case, it seems to be fetched automatically on the first aforementioned repo.
No problem. I obtained the data from the API @theosanderson mentioned, from which you can access whichever metrics you want as they were published on x date in the past backdated up to 23 August. Releases from 31 Nov up to yesterday can be scraped in the same way now.
Fantastic, thanks everyone! I'll put together a dataset with daily releases to measure reporting delay and to evaluate our model on. Exciting :)
Be aware that the release for 7/10/2020 is missing from https://coronavirus.data.gov.uk/details/download and will return HTTP200 with no content
Hej @adamaltmejd, would you like some help to convert the new data to the same format you feed the current graphs? I don't have any experience with R but it shouldn't be too hard to build on your code. I just need to get a dev env running.
After getting that working and maybe a graph comparing Sweden's delay with the others, I thought it could be interesting to write a blog post about the findings.
Feel free to explore it if you want! I won't have time to do anything for a week or two.
Made a version of my graph for the UK. Can be seen here: https://adamaltmejd.se/covid/deaths_lag_uk.png
Awesome! I actually played with it myself but I got so many small bugs with R trying to recompile the delta-t for the data since last Summer, I gave up at some point.
What's your early analysis? It seems like the British data has some interesting constants (no same day data, no data on Sundays nor on public holidays) that are similar to the Swedish one.
But besides that, there is just so much less blue on the UK's graph, they seem to be reporting the deaths many times faster and the ones over 14 days late are anecdotical.
It's impossible to know the causes for such a difference at that point. Difference in death confirmation method? Different delays in reporting? Priority given to accuracy versus speed?
But it would be interesting to discuss it with journalists and see if they can investigate and maybe question FHM about it.
Agreed it is super interesting. Agree with your observations too. My bet on the main reason for the big delays in Sweden is that we have a system in place already for death reporting at the national level - and that system has been used also for Covid. The problem is that it wasn't designed to be fast. The doctor who signs the death certificate has something like two weeks to send it in. So what has always worked well now has a speed problem that is not easy to fix.
That makes sense although for this as for much of the Swedish government with COVID, it's hard to justify that when other countries were able to do better.
How close would you say you are from generating the other graphs (reporting delay) and some with the Finnish data? Should we wait before sending this to journalists?
I think it would be great to send that to the data team at DN which has a graph similar to yours with deaths by death date. They would be able to double-check the data and code. Emanuel Karlsten would also probably be interested. Do you have time to do it? I could write a draft if you want.
Adding Finland is easy now, but the issue is that we do not have data going back in time and I haven't been collecting for long. Or did you manage to get archived data?
Seems also there is a bug with the finish data, for some reason its stopped collecting deaths and only collect cases for the last five days. https://github.com/adamaltmejd/covid/commit/c26de6e07f7937844b349749bd8282ec7b80023d
Trying to fix now...
Last time I checked, I couldn't find any older data. And the agency didn't save it either đ€Šđ»ââïž.
Maybe we should focus on the UK for now.
No idea why but seems we lost 6 days of downloads... Really unfortunate.
Here is a proposal for an e-mail to journalists:
Hej,
Jag kontaktar er eftersom jag tror att vi har upptÀckt nÄgot som kan vara vÀrt ert intresse angÄende COVID-19 och hur pandemin hanteras av regeringen.
Under det senaste Äret har Adam Altmejd, forskare pÄ handelshögskolan i Stockholm, sammanstÀllt Sveriges dödsfall efter dödsdatum och publicerat visualiseringar som visar hur lÀnge det tar för dödsfall att rapporteras. De ligger pÄ adamaltmejd.se/covid och kÀllkoden som genererar och uppdaterar visualiseringarna finns pÄ github.com/adamaltmejd/covid. Allt Àr baserat pÄ öppna data frÄn FolkhÀlsomyndigheten.
Dödsfall rapporteras ofta nÄgra dagar sent och under andra vÄgen har det ökat mycket, en majoritet rapporterades över 7 eller 14 dagar sent.
Vi tyckte att det var konstigt sÄ vi letade efter andra lÀnders data för att kunna jÀmföra. TyvÀrr publicerar vÀldigt fÄ lÀnder dödsfall efter dödsdatum men vi hittade tvÄ: Storbritannien och Finland.
HÀr Àr visualiseringarna för Sverige och Storbritannien bredvid varandra. Som ni kan se Àr skillnaden mycket stor. Det tar knappt nÄgra dagar och nÀstan aldrig över 7 dagar i Storbritannien.
Vi kan inte veta varför skillnaden Àr sÄ stor och dÀrför kontaktar vi er som Àr professionella journalister. Om ni tycker att det Àr relevant hoppas vi att ni kan kolla det hÀr djupare och kanske stÀlla frÄgor kring det till de relevanta makthavarna.
Allt vÄrt arbete kring detta ligger pÄ Github.
Here is finland: https://adamaltmejd.se/covid/deaths_lag_finland.png, not much there because of the low total numbers but they are very slow!
Feel free to send an email like that, it sounds great! I don't have time to manage it so would prefer not to sign it, but would of course be excellent if more journalists noticed these differences.
Here is finland: https://adamaltmejd.se/covid/deaths_lag_finland.png, not much there because of the low total numbers but they are very slow!
That's really interesting to see!
Feel free to send an email like that, it sounds great! I don't have time to manage it so would prefer not to sign it, but would of course be excellent if more journalists noticed these differences.
OK! Well, I won't remove your name since you're behind all this but I can send the e-mail and have you in cc. I don't have time to push it further either and we don't have any more knowledge anyway.
Thats what I meant! I'm really happy and appreciate you doing this, just don't want to be a signatory of the email cause I'm trying to put less not more time into Covid stuff :).
I understand! I sent the e-mails to DN and Emanuel Karlsten, you're in cc. Let's see what they answer đ.
Very cool. Thanks for engaging with this!!
Hej Adam,
Thanks for your awesome work, it's still one of the few graphs I browse regularly to keep me updated of the COVID situation in Sweden.
I find really shameful that the reporting delay is so bad and that it's gotten worse during this second wave. But I actually have no certainty that it's better in other countries. I've been looking for datasets that would enable me to determine if that's the case, without much success so far. Have you found any?