Open inglesp opened 7 years ago
Prescribing date: 2017_06
Problem: Check at top of bigquery_upload.py
failed, because there are no PracticeStatistics
objects from 2017_06.
Investigation:
log.json
says that the 2017_06 data was imported on 2017-06-26Steps to resolve:
Prescribing date: 2017_07
Problem: Running the July import jumped straight to updating the smoke tests.
Investigation:
TaskLog
objects for 2017_07. These have started_at
timestamps from July. This is probably a hangover from when we didn't explicitly specify the prescribing date, and the run_id
for a run was the current month not the prescribing month.Steps to resolve:
TaskLogs
for 2017_07Prescribing date: 2017_07
Problem: hscic_list_sizes
failed with Could not find any data for 2017_07
.
Investigation:
Steps to resolve:
Prescribing date: 2017_07
Problem: import_ppu_savings
failed with a couple of typos
Steps to resolve:
Prescribing date: 2017_08
Problem: hscic_prescribing
failed.
Investigation:
The command searches for the most recent data at http://content.digital.nhs.uk/searchcatalogue?q=title%3a%22presentation+level+data%22&sort=Most+recent, but this now redirects to https://digital.nhs.uk/article/191/Find-data-and-publications so nothing is found.
Steps to resolve:
Seb has found https://digital.nhs.uk/pubsearch?q=presentation+level+data&s=s which lists data up to July. As the August data has not yet been published, I've made the task a no-op for now, and will fix it to retrieve data from the new list later.
Prescribing date: 2017_08
Problem: convert_hscic_prescribing
failed with a TimeoutError
.
Investigation:
The timeout was while waiting for this query-and-insert in append_aggregated_data_to_prescribing_table
to finish. The timeout was very low (60s) from when I was testing the new BQ stuff, and I hadn't revisited it to increase it to something sensible.
Steps to resolve:
append_aggregated_data_to_prescribing_table
(it would have correctly triggered an assert), and restarted the importPrescribing date: 2017_08
Problem: import_dmd_snomed
failed with an ImportError
Steps to resolve:
Fixed in 269c10aba0df9159028e1581a30928ad5b9ac00c.
Prescribing date: 2017_08
Problem: bigquery_upload
failed with a couple of noddy errors caused by careless refactoring
Steps to resolve:
bigquery_upload.py
commented out to ensure we didn't run something twicePrescribing date: 2017_09
Problem: Data imported for 2017_09 was identical to data for 2017_08.
See #644 for discussion.
Prescribing date: 2017_09
Problem: fetch_prescribing_metadata
could not find any data about practices.
Steps to resolve:
I'd misspelt a filename. Fixed in #653.
Prescribing date: 2017_09
Problem: backup_prescribing_table
failed.
Investigation:
This is the second month that the task has run. It failed when looking at the backups created in the first month.
Steps to resolve:
Fixed in #654.
Prescribing date: 2017_09
Problem: bigquery_upload
failed when updating the BNF data.
Steps to resolve:
Fixed in #655.
Prescribing date: 2017_09
Problem: import_measures
failed with a 500 error
Investigation:
Nothing is logged at INFO level, so it was hard to work out how far import_measures
had got. But nothing had been written to the database, so it was safe to re-run the command.
>>> MeasureValue.objects.filter(month='2017-09-01').count()
0
>>> MeasureGlobal.objects.filter(month='2017-09-01').count()
0
Steps to resolve:
Re-ran the pipeline.
Prescribing date: 2017_09
Problem: import_measures
failed when running create_omnibus_lp_view()
.
Investigation:
This was because the SQL used to create the view used to include the project name, which I recently removed. All the other SQL files in the measure_sql
directory are used for normal queries and so don't require the project name, but this SQL is used to create a table with a view, and so does require the project name. I had missed this, as create_omnibus_lp_view
didn't use gcutils.bigquery.Client
.
Steps to resolve:
Fixed in #658. Hacked import_measures
to skip measures before lpzomnibus
, and re-ran the pipeline.
Prescribing date: 2017_10
Problem: import_measures
failed when making lpzomnibus
measure.
Investigation:
Basically bugs introduced in recent refactoring.
Steps to resolve:
Patch (to be applied) https://gist.github.com/sebbacon/2faed130e5eb5612fd4563a7408de55e
Prescribing date: 2017_10
Problem: BigQuery availability issues
File "/webapps/openprescribing/openprescribing/frontend/management/commands/import_measures.py", line 365, in insert_rows_from_query
legacy=legacy
File "/webapps/openprescribing/openprescribing/gcutils/bigquery.py", line 215, in insert_rows_from_query
job.begin()
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 380, in begin
method='POST', path=path, data=self._build_resource())
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/google/cloud/_http.py", line 290, in api_request
headers=headers, target_object=_target_object)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/google/cloud/_http.py", line 183, in _make_request
return self._do_request(method, url, headers, data, target_object)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/google/cloud/_http.py", line 212, in _do_request
url=url, method=method, headers=headers, data=data)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/google/auth/transport/requests.py", line 179, in request
method, url, data=data, headers=request_headers, **kwargs)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.googleapis.com', port=443): Max retries exceeded with url: /bigquery/v2/projects/ebmdatalab/jobs (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f4d0923f450>: Failed to establish a new connection: [Errno 101] Network is unreachable',))```
Prescribing date: 2017_10
Problem: Datasets too large to sort in BQ
See #699 and #722.
Prescribing date: 2017_11
Problem: Hit IntegrityError
in import_dmd
Investigation:
Here's the traceback:
Traceback (most recent call last):
File "manage.py", line 12, in <module>
execute_from_command_line(sys.argv)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
utility.execute()
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 345, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/core/management/base.py", line 348, in run_from_argv
self.execute(*args, **cmd_options)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/core/management/base.py", line 399, in execute
output = self.handle(*args, **options)
File "/webapps/openprescribing/openprescribing/pipeline/management/commands/fetch_and_import.py", line 12, in handle
run_all(kwargs['year'], kwargs['month'])
File "/webapps/openprescribing/openprescribing/pipeline/runner.py", line 432, in run_all
run_task(task, year, month)
File "/webapps/openprescribing/openprescribing/pipeline/runner.py", line 403, in run_task
task.run(year, month, **kwargs)
File "/webapps/openprescribing/openprescribing/pipeline/runner.py", line 245, in run
call_command(*tokens)
File "/webapps/openprescribing/openprescribing/pipeline/runner.py", line 383, in call_command
return django_call_command(*args)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 119, in call_command
return command.execute(*args, **defaults)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/core/management/base.py", line 399, in execute
output = self.handle(*args, **options)
File "/webapps/openprescribing/openprescribing/dmd/management/commands/import_dmd.py", line 328, in handle
create_dmd_product()
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/db/transaction.py", line 223, in __exit__
connection.commit()
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 242, in commit
self._commit()
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 211, in _commit
return self.connection.commit()
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/db/utils.py", line 95, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/webapps/openprescribing/.venv/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 211, in _commit
return self.connection.commit()
django.db.utils.IntegrityError: update or delete on table "dmd_product" violates foreign key constraint "dmd_tariffprice_product_id_39bb1b2e_fk_dmd_product_dmdid" on table "dmd_tariffprice"
DETAIL: Key (dmdid)=(329298009) is still referenced from table "dmd_tariffprice".
Steps to resolve:
import_dmd
and import_dmd_snomed
from list of tasks, and restarted importimport_dmd
and import_dmd_snomed
Prescribing date: 2018_01
Problem: import_dmd_snomed
failed because manually-downloaded zip file hadn't been extracted.
Steps to resolve: Unzipped zip file manually and tried again.
Prescribing date: 2018_01
Problem: import_dmd_snomed
failed because column headers in spreadsheet had changed.
Steps to resolve: Loosened column header assertions. See https://github.com/ebmdatalab/openprescribing/commit/9b3390bb0d69f9f2cf887863668924117d85635c.
Prescribing date: 2018_01
Problem: import_dmd_snomed
failed with django.db.utils.DataError: invalid input syntax for integer: "'7264811000001106"
Investigation:
'
openpyxl
to do the right thing here'
s ourselvesPrescribing date: 2018_02
Problem: import_ppu_savings
failed. This get_or_create
raised:
django.db.utils.IntegrityError: duplicate key value violates unique constraint "dmd_product_temp_pkey"
DETAIL: Key (dmdid)=(10000000000) already exists.
Investigation:
DMDProduct
with dmdid=10000000000
already exists, but its bnf_code
attribute is None
. This explains why the get_or_create
failed, as it first tries to get a DMDProduct
with dmdid=10000000000
and bnf_code='0601060D0AAA0A0'
.bnf_code
to None
for all DMDProduct
s before re-running import_dmd_snomed
which would have set the bnf_code
for any DMDProduct
s it could -- except for the two that are created by import_ppu_savings
.bnf_code
for these manually.Prescribing date: 2018_03
Problem: Fetchers whose data comes from NHS Digital failed because their URLs had changed. See https://digital.nhs.uk/services/organisation-data-service/data-downloads for more.
Steps to fix: See https://github.com/ebmdatalab/openprescribing/pull/855.
Prescribing date: 2018_04
Problem: URL for patient list size data was not as expected.
One of these things is not like the others:
Steps to fix:
Prescribing date: 2018_05
Problem: The BNF code download had already been run and this resulted in confusing error when we tried to run it again (see #941)
Steps to fix: Do nothing and move on
Prescribing date: 2018_07
Problem: The practice list size file was misnamed in the July dataset. It omits the month and year in the name, whereas filenames for previous and subsequent months all contain it.
Steps to fix:
pipeline.TaskLog
object corresponding to the failed download, change its status to success and save it.Prescribing date: 2018_08
Problem: import_dmd
failed when trying to delete a record from dmd_product
because that record was referenced in dmd_tariffprice
.
Investigation: The product in question is "Glycerol 4g suppositories" which has VPID 317609008. In the latest data, the VPID has changed to 35368211000001106. See #1126.
Steps to fix:
import_dmd
as having succeeded, and restart the pipeline.Prescribing date: 2018_08
Problem: import_adqs
failed with an exception that was caused by a bug that was fixed in #1083... except that #1083 hadn't been merged.
Steps to fix:
Note: This would have been caught had the end-to-end tests been running nightly. The weren't, so it wasn't. My fault, sorry.
Prescribing date: 2018_09
Problem: import_dmd
failed in the same way as it did last month. This is no surprise as I haven't fixed #1126.
Prescribing date: 2018_10
Problem: fetch_prescribing_metadata_addresses
and fetch_prescribing_metadata_chemicals
failed because there was no data to fetch.
Investigation:
In fact, they've not fetched anything since 2018_06, but only fail if they've not been able to fetch data for the past six months.
We expect the data to be at a URL like http://datagov.ic.nhs.uk/presentation/2017_08_August/T201708ADDR+BNFT.CSV, and as far as I'm aware, we don't know where links to the data. It doesn't appear in the search at data.gov.uk.
The data is not essential for the main parts of the site. I believe it contains details of some practices before it appears in epraccur.csv
, and details of names of chemicals by BNF prefix.
For now, I've marked the two tasks as successful, so we can continue with the rest of the import.
See https://github.com/ebmdatalab/openprescribing/issues/1291.
Prescribing date: 2018_11
Problem: https://github.com/ebmdatalab/openprescribing/issues/571#issuecomment-448749032 again.
Prescribing date: 2018_12
Problem: fetch_patient_list_size
failed because the file it was expecting to get had an unexpected name. From https://digital.nhs.uk/data-and-information/publications/statistical/patients-registered-at-a-gp-practice/december-2018 we expect to find a link to a file called gp-reg-pat-prac-quin-age-dec-18.csv, but instead it was called gp-reg-pat-prac-quin-age.csv.
Steps to fix:
Prescribing date: 2018_12
Problem: fetch_prescribing_metadata
failed because of a bug.
Steps to fix:
Notes:
Prescribing date: 2018_12
Problem: import_dmd
failed because it tried to import some old data that it had previously failed to import. See this import problem record and #1126. After fixing #1126 we've imported the latest dm+d data. The old data included VMPs whose IDs have been updated in the most recent data, and so did not include VMPs with the new ID, which are now referenced by TariffPrice
records. This caused the import of old data to raise an IntegrityError
.
Steps to fix:
Notes:
Prescribing date: 2019_01
Problem: fetch_patient_list_size
failed for same reason as it did last month
Steps to fix: As last month, downloaded file manually.
Notes: See #1577.
Prescribing date: 2019_01
Problem: fetch_prescribing_metadata_addresses
failed because the fetcher expected a .CSV
extension, but the file had a .csv
extension.
Steps to fix: Patched fetch_prescribing_metadata_addresses
to ignore the case of the filename in a regex. See #1578.
Prescribing date: 2019_01
Problem: import_measures
failed after getting a 500 response from BQ when importing the vitb measure.
Steps to fix: Since vitb is the last measure, I imported it by itself, marked the task as having succeeded, and restarted the pipeline.
Prescribing date: 2019_04
Problem: create_bq_views
failed because of a missing project name in vw__ghost_generic_measure.sql
Steps to fix: Fixed in a4cd657.
Prescribing date: 2019_04
Problem: import_measures
failed for bdzper1000
and fungal
with DataError: invalid input syntax for type json
Investigation:
bdzper1000.json
and restarted the pipeline, hoping that it was a blipInfinity
, which cannot be seralized to JSONInfinity
because the a regional team had some prescribing, but a zero list sizehandle_orphan_practices
meant that this command never did anything, which meant that a handful of closed practices were in CCGs that are now closed, which were in turn in regional teams that are now closedSteps to fix: Fixed in 3c136a1.
Prescribing date: 2019_04
Problem: import_measures
failed for ghost_generic_measure
with DataError: invalid input syntax for type json
Investigation: See #1764
Steps to fix: Hacked import_measures
so that it would ignore this measure and restarted the pipeline. We should probably fix #1764 by ignoring prescribing with zero items, and will need to reimport this measure once that's done.
Prescribing date: 2019_04
Problem: matrixstore_build
is going to take a few days to run. (Why? It didn't last time.) create_views
won't run until matrixstore_build
is finished. The concessions pages are broken because create_views
hasn't run. (#1763)
Steps to fix:
./manage.py create_views
in a screen sessioncreate_views
laterPrescribing date: 2019_05
Problem: Deadlock, twice, during drop_partition()
in import_hscic_prescribing
. In both cases, the other process was a user request for a practice homepage in ncso_spending_for_entity()
.
Steps to fix: Restarted the pipeline, and it worked the third time. This should be safe, as drop_partition()
is the first function to do any work.
Prescribing date: 2019_05
Problem: import_measures
ran, but there are no measures in the database!
Investigation: See #1889.
Prescribing date: 2019_06
Problem: The pipeline runner crashed with:
OSError: [Errno 13] Permission denied: '/tmp/numbers-checker/20190821140722/numbers.json'
Steps to fix: Deleted /tmp/numbers-checker/*/numbers.json
manually, and created last task manually with TaskLog.objects.create(year=2019, month=6, task_name="fetch_and_import", status=TaskLog.SUCCESSFUL)
.
See #2011.
Prescribing data: 2019_10
Problem: Could not download PCN data. https://files.digital.nhs.uk/assets/ods/current/ePCN.zip returns a 403.
Steps to fix: It's half four on the Friday before Christmas. I marked the task as successful and restarted the pipeline.
Update: it's now hosted at https://digital.nhs.uk/binaries/content/assets/website-assets/services/ods/data-downloads-other-nhs-organisations/epcn.zip. Import it, then reimport measures for PCNs.
Update 2: I didn't reimport measures for PCNs, but have fixed this in #2389 for the 2019_11 import.
Prescribing date: 2019_12
Problem: The format of the PCN spreadsheet has changed slightly.
Steps to fix:
Patch import_pcns.py
on the server and run again.
diff --git a/openprescribing/pipeline/management/commands/import_pcns.py b/openprescribing/pipeline/management/commands/import_pcns.py
index 5f10dddf..72bf274a 100644
--- a/openprescribing/pipeline/management/commands/import_pcns.py
+++ b/openprescribing/pipeline/management/commands/import_pcns.py
@@ -14,7 +14,7 @@ class Command(BaseCommand):
def handle(self, *args, **kwargs):
workbook = load_workbook(kwargs["filename"])
- details_sheet = workbook.get_sheet_by_name("PCN Details")
+ details_sheet = workbook.get_sheet_by_name("PCNDetails")
members_sheet = workbook.get_sheet_by_name("PCN Core Partner Details")
pcn_details = {}
for code, name in self.get_pcn_details_from_sheet(details_sheet):
@@ -50,8 +50,8 @@ class Command(BaseCommand):
rows = ([cell.value for cell in row] for row in sheet.rows)
headers = next(rows)
- PRACTICE_COL = headers.index("Partner Organisation Code")
- PCN_COL = headers.index("PCN Code")
+ PRACTICE_COL = 0
+ PCN_COL = 4
for n, row in enumerate(rows, start=2):
practice_code = row[PRACTICE_COL]
Have also emailed the ODS helpdesk:
Hi,
I'm a programmer working on OpenPrescribing, and we import ODS data monthly to help power various tools on our site.
This month our automatic importer broke because the format of the epcn.xls spreadsheet (linked to from the bottom of the GP and GP practice related data) has changed slightly, deviating from the specification in "ePCN.xlsx Specification.pdf".
- Firstly, the tab which was previously called "PCN Details" is now called "PCNDetails".
- Secondly, on the "PCN Core Partner Details" tab the text in the column headings is now separated by new lines rather than by spaces.
Both these changes are insignificant for humans reading the data, but confuse machines. If data is not machine readable, then it can't be easily reused in other tools.
Please could you try to ensure that the format of epcn.xls stays consistent?
Thanks,
Peter
Prescribing date: 2020_07
Problem: import_measures
failed with:
File "/webapps/openprescribing/openprescribing/frontend/management/commands/import_measures.py", line 142, in handle
self.build_measures(measure_defs, start_date, end_date, verbose, options)
File "/webapps/openprescribing/openprescribing/frontend/management/commands/import_measures.py", line 119, in build_measures
calcuation.calculate(options["bigquery_only"])
File "/webapps/openprescribing/openprescribing/frontend/management/commands/import_measures.py", line 613, in calculate
self.calculate_practices(bigquery_only=bigquery_only)
File "/webapps/openprescribing/openprescribing/frontend/management/commands/import_measures.py", line 631, in calculate_practices
self.write_practice_ratios_to_database()
File "/webapps/openprescribing/openprescribing/frontend/management/commands/import_measures.py", line 738, in write_practice_ratios_to_database
cursor.copy_expert(copy_str % ", ".join(MEASURE_FIELDNAMES), f)
psycopg2.errors.NumericValueOutOfRange: integer out of range
CONTEXT: COPY frontend_measurevalue, line 412988: "nimodipine,Y60,E54000016,05L,U45156,M88646,2016-04-01,0.0,2.075,0.0,0.0,"
Steps to solve: See #2847.
Prescribing date: 2020_09
Problem: hscic_list_sizes
failed with:
Traceback (most recent call last):
File "manage.py", line 26, in <module>
execute_from_command_line(sys.argv)
File "/webapps/openprescribing/.venv38/lib/python3.8/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/webapps/openprescribing/.venv38/lib/python3.8/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/webapps/openprescribing/.venv38/lib/python3.8/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/webapps/openprescribing/.venv38/lib/python3.8/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/webapps/openprescribing/openprescribing/pipeline/management/commands/fetch_and_import.py", line 12, in handle
run_all(kwargs["year"], kwargs["month"])
File "/webapps/openprescribing/openprescribing/pipeline/runner.py", line 413, in run_all
run_task(task, year, month)
File "/webapps/openprescribing/openprescribing/pipeline/runner.py", line 389, in run_task
task.run(year, month, **kwargs)
File "/webapps/openprescribing/openprescribing/pipeline/runner.py", line 215, in run
call_command(*tokens)
File "/webapps/openprescribing/openprescribing/pipeline/runner.py", line 376, in call_command
return django_call_command(*args)
File "/webapps/openprescribing/.venv38/lib/python3.8/site-packages/django/core/management/__init__.py", line 148, in call_command
return command.execute(*args, **defaults)
File "/webapps/openprescribing/.venv38/lib/python3.8/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/webapps/openprescribing/openprescribing/pipeline/management/commands/hscic_list_sizes.py", line 40, in handle
source_url = tree.xpath("//a[contains(@href, '{}')]/@href".format(filename))[0]
I can reproduce locally by running ./manage.py hscic_list_sizes 2021 9
.
Steps to fix: Practice list data has been corrected and republished, so the file we're looking for wasn't listed on the site. I hacked the fetcher:
- filename = "gp-reg-pat-prac-quin-age.csv"
+ filename = "gp-reg-pat-prac-quin-age-v2.csv"
and restarted the pipeline.
This issue is for tracking problems with importing data, and recording the manual steps taken to resolve them.