UKHSA-Internal / coronavirus-dashboard-api-python-sdk

Coronavirus (COVID-19) in the UK - API Service SDK for Python
https://coronavirus.data.gov.uk/
MIT License
67 stars 18 forks source link

Only single record returned for Scottish areas #11

Open infinnovation-dev opened 4 years ago

infinnovation-dev commented 4 years ago

Probably an issue with the backend rather than the python API; not sure where best to report it.

Queries for Scottish areas seem to return only a single record, depending on what is requested in the structure.

from uk_covid19 import Cov19API

filters = ['areaCode=S12000049','areaType=utla']
structure = {
    'date':'date',
    'newCasesBySpecimenDate':'newCasesBySpecimenDate',
}
res = Cov19API(filters,structure).get_json()
print(res['length'], res['data'][-1])
# 1 {'date': '2020-08-12', 'newCasesBySpecimenDate': 6}

# Add another field not of interest, apparently always 0
structure['newCasesByPublishDate'] = 'newCasesByPublishDate'
res = Cov19API(filters,structure).get_json()
print(res['length'], res['data'][-1])
# 221 {'date': '2020-01-05', 'newCasesBySpecimenDate': None, 'newCasesByPublishDate': 0}

I'd expect to get the same number of records regardless of structure metrics. Something to do with None versus 0 perhaps?

xenatisch commented 4 years ago

It's actually neither. We just don't have the data in our database for ...BySpecimenDate for the DAs.

We only have recently started to receive the data, but it takes a while for our data team to incorporate it into the pipeline. They will be added in the coming weeks.

Hope this helps.

GraemeRMcAllister commented 4 years ago

Hi.

Sorry to hijack the thread. The data is for Scotland's council areas are recorded on a day-by-day basis but is volatile. In that, it is only the most recent record that returns any information, even though previous day's data has existed (#6).

Has this always been the intention?

xenatisch commented 4 years ago

@GraemeRMcAllister All cases data that we receive on a daily basis are volatile, which is why they change every day as we process and deduplicate them from 10s of millions of records every single day. The most recent records are usually the most volatile... they tend to settle down after a while.

We are calculating the records by publish date (where one is not provided) from a number of other records (if they are available) to provide a level of consistency. If you need more details, it's probably worth it to get in touch with our data team.