Closed craigbloodworth closed 3 years ago
Looks like this has been resolved with today's data refresh however now page 13 fails at the utla level.
API request:
GET https://api.coronavirus.data.gov.uk/v1/data?filters=areaType=utla&structure={"areaType": "areaType","areaName": "areaName","areaCode": "areaCode","date": "date","newCasesByPublishDate": "newCasesByPublishDate","cumCasesByPublishDate": "cumCasesByPublishDate","hospitalCases": "hospitalCases","newCasesBySpecimenDate": "newCasesBySpecimenDate","cumCasesBySpecimenDate": "cumCasesBySpecimenDate","newAdmissions": "newAdmissions","covidOccupiedMVBeds": "covidOccupiedMVBeds","newDeaths28DaysByPublishDate": "newDeaths28DaysByPublishDate","cumDeaths28DaysByPublishDate": "cumDeaths28DaysByPublishDate","newDeaths28DaysByDeathDate": "newDeaths28DaysByDeathDate","cumDeaths28DaysByDeathDate": "cumDeaths28DaysByDeathDate"}&page=13
Responds with:
{ "response": "An internal error occurred whilst processing your request, please try again. If the problem persists, please report as an issue and include your request.", "status_code": 500, "status": "Internal Server Error" }
The page 25 error is happening again today
Hi, thank you for getting in touch.
The issue is related to APIv1, not v2, so I will transfer it to the relevant repository.
May I ask why are you not using APIv2 for downloading an entire dataset?
Also, some of the metrics in your query don't exist for U/LTLA levels - e.g. newAdmissions
is a healthcare metric and is only available for nhsRegion
or nhsTrust
.
If I had to guess, I'd say that your request has way too many metrics, which means that it takes a lot more time + resources to process. By the time it's done processing, you may already have exceeding the max time - hence the 504 time out.
If you would like to continue with APIv1, I recommend you submit your requests with fewer number of metrics. Generally speaking, the performance tends to deteriorate when the number of metrics exceed 10.
Investigation results:
Timestamp: 2021-06-09T08:20:56.633Z
Unique request id (Operation ID): bd4edb8d3ece70479086dc028a4e68c9
Served form: UK West 02 / f1
Processed query:
GET - /api/v1/data?filters=areaType=ltla&structure=%7B%22areaType%22:%20%22areaType%22,%22areaName%22:%20%22areaName%22,%22areaCode%22:%20%22areaCode%22,%22date%22:%20%22date%22,%22newCasesByPublishDate%22:%20%22newCasesByPublishDate%22,%22cumCasesByPublishDate%22:%20%22cumCasesByPublishDate%22,%22hospitalCases%22:%20%22hospitalCases%22,%22newCasesBySpecimenDate%22:%20%22newCasesBySpecimenDate%22,%22cumCasesBySpecimenDate%22:%20%22cumCasesBySpecimenDate%22,%22newAdmissions%22:%20%22newAdmissions%22,%22covidOccupiedMVBeds%22:%20%22covidOccupiedMVBeds%22,%22newDeaths28DaysByPublishDate%22:%20%22newDeaths28DaysByPublishDate%22,%22cumDeaths28DaysByPublishDate%22:%20%22cumDeaths28DaysByPublishDate%22,%22newDeaths28DaysByDeathDate%22:%20%22newDeaths28DaysByDeathDate%22,%22cumDeaths28DaysByDeathDate%22:%20%22cumDeaths28DaysByDeathDate%22%7D&page=25&format=json
Response time: ~6.5 seconds
SELECT
area_code AS "areaCode",
ref.area_type AS "areaType",
area_name AS "areaName",
date::VARCHAR AS date,
metric,
CASE
WHEN (payload ? 'value') THEN (payload -> 'value')
ELSE payload::JSONB
END AS value
FROM covid19.time_series_p2021_6_8_ltla AS ts
JOIN covid19.metric_reference AS mr ON mr.id = metric_id
JOIN covid19.release_reference AS rr ON rr.id = release_id
JOIN covid19.area_reference AS ref ON ref.id = area_id
WHERE
metric = ANY($1::VARCHAR[])
AND rr.released IS TRUE
AND area_type = $2 AND mr.released IS TRUE
ORDER BY area_code, date DESC
LIMIT 37500 OFFSET 900000
{
"arguments": [
[
"cumCasesByPublishDate",
"newAdmissions",
"newCasesBySpecimenDate",
"cumCasesBySpecimenDate",
"cumDeaths28DaysByDeathDate",
"covidOccupiedMVBeds",
"newDeaths28DaysByPublishDate",
"cumDeaths28DaysByPublishDate",
"hospitalCases",
"newDeaths28DaysByDeathDate",
"newCasesByPublishDate"
],
"ltla"
]
}
could not convert string to float: 'null'
Traceback (most recent call last):
File "/home/site/wwwroot/api_v1/api.py", line 93, in api_handler
response = await get_data(
File "/home/site/wwwroot/api_v1/api_handler/database.py", line 388, in get_data
df
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/generic.py", line 5403, in pipe
return com.pipe(self, func, *args, **kwargs)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/common.py", line 440, in pipe
return func(obj, *args, **kwargs)
File "/home/site/wwwroot/api_v1/api_handler/database.py", line 308, in format_dtypes
return df.astype(column_types)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/generic.py", line 5859, in astype
col.astype(dtype=dtype[col_name], copy=copy, errors=errors)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/generic.py", line 5874, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/internals/managers.py", line 631, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/internals/managers.py", line 427, in apply
applied = getattr(b, f)(**kwargs)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/internals/blocks.py", line 673, in astype
values = astype_nansafe(vals1d, dtype, copy=True)
File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/dtypes/cast.py", line 1097, in astype_nansafe
return arr.astype(dtype, copy=True)
ValueError: could not convert string to float: 'null'
@craigbloodworth, could you please confirm that the issue is resolved?
Yes this is now working. Thanks @xenatisch
I haven't yet switched to v2 as it requires almost a full rewrite of the existing code and many of the errors I hit tend to be with partial & invalid json responses from the v2 API.
Happy to look into the issues re APIv2. Feel free to raise them in the APIv2 repository.
The API call
GET https://api.coronavirus.data.gov.uk/v1/data?filters=areaType=ltla&structure={"areaType": "areaType","areaName": "areaName","areaCode": "areaCode","date": "date","newCasesByPublishDate": "newCasesByPublishDate","cumCasesByPublishDate": "cumCasesByPublishDate","hospitalCases": "hospitalCases","newCasesBySpecimenDate": "newCasesBySpecimenDate","cumCasesBySpecimenDate": "cumCasesBySpecimenDate","newAdmissions": "newAdmissions","covidOccupiedMVBeds": "covidOccupiedMVBeds","newDeaths28DaysByPublishDate": "newDeaths28DaysByPublishDate","cumDeaths28DaysByPublishDate": "cumDeaths28DaysByPublishDate","newDeaths28DaysByDeathDate": "newDeaths28DaysByDeathDate","cumDeaths28DaysByDeathDate": "cumDeaths28DaysByDeathDate"}&page=25
has been returning a 504 Gateway Timeout error for over 12 hours now. It seems to be specific to page 25. Pages 24 & 26 work just fine.