UKHSA-Internal / coronavirus-dashboard-api-python-sdk

Coronavirus (COVID-19) in the UK - API Service SDK for Python
https://coronavirus.data.gov.uk/
MIT License
67 stars 18 forks source link

Archive data through the API #3

Open radka-j opened 4 years ago

radka-j commented 4 years ago

Hello 👋

The API documentation says that data previously published in the dashboard (currently on the archive: https://coronavirus.data.gov.uk/archive ) is available for download through the API as well. But I haven't been able to figure out how to do structure the query to get this. Is there an example of what such a query looks like?

For example, I want to know the number of cases in Leicester for the date 2020-06-01 as reported on 2020-06-02, 2020-06-03, .....

Any help with this would be much appreciated, thank you!

xenatisch commented 4 years ago

Hi @radka-j

Thanks for getting in touch.

So what you are looking for is previous states of the data (previous reports). We do have the data in the database and are planning to provide the means to retrieve them via the API in the future. However, there is very little demand for it, and we have some high-priority works in our development pipeline.

We will get it done as soon as possible. Keep an eye out for the feature in the API docs and our SDKs.

lewissmit commented 4 years ago

Hi @xenatisch,

Just hoping to clarify - could you confirm that the API does not currently support provision of data for previous dates - so as an example the number of positive tests in a geography is only available as at the most recent data release?

If this is the case it causes us some difficulties - the daily positive tests data is frequently retrospectively amended, so a day to day record of positive cases will not provide the full picture. Specifically users will not be able to track the final tally positive tests over time series. To be clear: I'd like to have access to a day-to-day total of positive tests at all geographies, which when combined would be equal to the cumulative total currently available on the front end of your site.

If this is the case please could this be escalated in terms of priority? If I'm misinterpreting somehow apologies in advance.

geeogi commented 4 years ago

Hi @xenatisch,

We are also very eager to regain access to the time series data for UTLA positive tests. As it stands the newCasesByPublishDate field from this API returns the latest numbers only which is difficult to interpret for the reasons described by @lewissmit.

We used to have access to this data via this link: https://coronavirus.data.gov.uk/downloads/data/data_latest.json but it doesn't seem to have been updated since the 3rd.

bhavesh0009 commented 4 years ago

Hi @xenatisch,

We are also very eager to regain access to the time series data for UTLA positive tests. As it stands the newCasesByPublishDate field from this API returns the latest numbers only which is difficult to interpret for the reasons described by @lewissmit.

We used to have access to this data via this link: https://coronavirus.data.gov.uk/downloads/data/data_latest.json but it doesn't seem to have been updated since the 3rd.

I am able to find workaround for this issue.

    ltla_filter = ['areaType=ltla']
    cases_and_deaths = {
                        "areaType":"areaType"
                        ,"areaName":"areaName"
                        ,"areaCode":"areaCode"
                        ,"specimenDate":"date"
                        ,"dailyLabConfirmedCases":"newCasesBySpecimenDate"
                        ,"totalLabConfirmedCases":"cumCasesBySpecimenDate"
                        }
    api = Cov19API(filters=ltla_filter, structure=cases_and_deaths)
    data = api.get_json()  # Returns a dictionary                        
    lastUpdate = data['lastUpdate']

above code gives historical information for LTLA. I don't know how but probably due to some changes in the structure.

geeogi commented 4 years ago

Hi @xenatisch, We are also very eager to regain access to the time series data for UTLA positive tests. As it stands the newCasesByPublishDate field from this API returns the latest numbers only which is difficult to interpret for the reasons described by @lewissmit. We used to have access to this data via this link: https://coronavirus.data.gov.uk/downloads/data/data_latest.json but it doesn't seem to have been updated since the 3rd.

I am able to find workaround for this issue.

    ltla_filter = ['areaType=ltla']
    cases_and_deaths = {
                        "areaType":"areaType"
                        ,"areaName":"areaName"
                        ,"areaCode":"areaCode"
                        ,"specimenDate":"date"
                        ,"dailyLabConfirmedCases":"newCasesBySpecimenDate"
                        ,"totalLabConfirmedCases":"cumCasesBySpecimenDate"
                        }
    api = Cov19API(filters=ltla_filter, structure=cases_and_deaths)
    data = api.get_json()  # Returns a dictionary                        
    lastUpdate = data['lastUpdate']

above code gives historical information for LTLA. I don't know how but probably due to some changes in the structure.

thanks! This works for me using UTLAs too e.g. link. The Python SDK retrieves all the pages which is handy.

xenatisch commented 4 years ago

Hi @lewissmit and @geeogi ... sorry for my late response. It's been a long day.

There is a different between Archive data and historical data. We release historical data everyday, but they may contain revised figures. This is because we receive new data everyday + deduplicate the data everyday (sometimes from months back). So "Archive data" provides the data as released on day X, but historical data provides the data for everyday since the beginning - whenever that may be for a specific area. Hope that makes sense? It's explained better in the About the Data page on the website.

The issue you have raised was addressed in #1 (which I have now pinned to the issues page because it seems to be very popular). I see that @bhavesh0009 has kindly shared his solution with you too.

The reason why we have 2 types of data for cases / deaths is because some DAs release the data only by Reporting Date, so for consistency on the website, we also produce England data by Reporting Data, but only for the latest day. We are working with other DAs to get the data by specimen date, in which case, we will have consistent data for everything.

Let me know if you need more info, or are still experiencing any difficulties.

radka-j commented 4 years ago

@xenatisch I hope you are well! 🙂 Is there any update on this (e.g., when might we expect for the archive to be available through the API)? Or do you know if there is another way to access this data?

theosanderson commented 4 years ago

For anyone stumbling upon this -- this data for cases is (unofficially) available for recent days here