ebmdatalab / openprescribing

A Django app providing a REST API and dashboards for the HSCIC's GP prescribing data
https://openprescribing.net
MIT License
97 stars 26 forks source link

Missing BNF code changes #3713

Closed richiecroker closed 1 year ago

richiecroker commented 2 years ago

The BSA provide code maps on an annual basis to map where BNF code changes. We have a list of what we have used here.

However we appear to have a number of missing years:

2016

Using the code below suggests there are no missing BNF code changes between December 2015 and January 2016 data

WITH
  bnf_details AS (
  SELECT
    bnf_name,
    bnf_code
  FROM
    `ebmdatalab.hscic.normalised_prescribing`
  WHERE
    month BETWEEN '2015-12-01'
    AND '2016-01-01'
  GROUP BY
    bnf_name,
    bnf_code)
SELECT
  bnf_name
FROM
  bnf_details
GROUP BY
  bnf_name
HAVING
  COUNT(bnf_code)>1

2019

Using similar code to above, with WHERE month BETWEEN '2018-12-01' AND '2019-01-01' also returns no data.

2021

NHSBSA have released a BNF code change map for 2021 here. Running the code above suggests that this file hasn't been imported.

2022

NHSBSA have released a BNF code change map for 2022 here. Running the code above suggests that this file hasn't been imported.

Can we import these as soon as possible to ensure that there aren't issues with data quality please. Also it's likely the 2023 map will be out in February 2023, so we need to check then for any code change maps.

evansd commented 2 years ago

I think there's going to be some quite serious software archaeology required to understand what's needed here.

The comments here suggest that the original files were supplied privately via email and than manually corrected: https://github.com/ebmdatalab/openprescribing/blob/dfd7e4f92dbac10739203027d90dafb577abdfd3/openprescribing/frontend/management/commands/generate_presentation_replacements.py#L30-L32

But then they also suggest that 2017 is the last year they should be needed, which is obviously not the case: https://github.com/ebmdatalab/openprescribing/blob/dfd7e4f92dbac10739203027d90dafb577abdfd3/openprescribing/frontend/management/commands/generate_presentation_replacements.py#L75-L77

There's an open issue here on the question of exactly how these changes are supposed to work:

I think it would be reasonably straightforward to generate 2021.txt and 2022.txt files from the XLSX files linked above. And then assuming that the generate_presentation_replacements command still works it should be possible to import these in the same way as the existing replacements.

I've no idea how we'd go about obtaining the missing 2016 or 2019 data though — this predates even my tenure here.

evansd commented 2 years ago

I think @inglesp and @sebbacon are going to need to go email spelunking to find out how we obtained the 2018 file from NHSBSA. (It was added by Peter on 14 Aug 2018 if that helps narrow things down.)

Possibly the same process can be used to get the missing 2019 and 2016 files.

sebbacon commented 2 years ago

Have forwarded some breadcrumbs by email

richiecroker commented 2 years ago

@evansd I'm not sure there are any 2016 and 2019 changes - there are no BNF codes with a COUNT>1 in the search that I did.

richiecroker commented 2 years ago

The 2017 changes are on the BSA website here

evansd commented 2 years ago

I'm not sure there are any 2016 and 2019 changes - there are no BNF codes with a COUNT>1 in the search that I did.

Ahhh, right — sorry! I think I totally misunderstood what you were getting at here. You had the line "we appear to have a number of missing years" followed by a list of year headings so I just assumed these were all missing years. But if I've understood correctly 2021 and 2022 are the only actually missing ones, and both of those are available for download already. Is that correct?

If that's the case then @inglesp and @sebbacon can stand down. I can just try to use the existing command with the new data and see what happens.

richiecroker commented 2 years ago

Sorry @evansd , yes I started here and worked through, which is why I said several years missing - but I don't think there were any changes in those two years.

evansd commented 2 years ago

Yes, that makes sense now. I just ended up down the wrong garden path and then confused myself!

evansd commented 2 years ago

Sigh

I've deployed the PR with the new files and first tried running the generate_presentation_replacements as the ebmbot user. That got me a permissions error:

PermissionError: [Errno 13] Permission denied: '//mnt/database/tmp/unused_codes_section_code.csv'

That file (and a few others like it) are all owned by hello so I tried running as that user. That then died with a protected foreign key error:

ProtectedError: ("Cannot delete some instances of model 'Presentation' because they are referenced through a protected foreign key: 'Presentation.replaced_by'", <QuerySet [<Presentation: 0101021B0BEAKAL: Gaviscon Advance oral suspension peppermint>, <Presentation: 0105000B0BDAFAN: Salofalk 1g gastro-resistant modified-release granules sachets>, <Presentation: 0106040M0BBAEAF: Movicol Chocolate oral powder 13.9g sachets>, <Presentation: 0206010F0BBABAA: Coro-Nitro 400micrograms/dose pump sublingual spray>, <Presentation: 0206010F0BFABAA: Nitrolingual 400micrograms/dose pump sublingual spray>, <Presentation: 0206010F0BRABAA: Glytrin 400micrograms/dose aerosol sublingual spray>, <Presentation: 0301011Y0AAAAAA: Fluticasone furoate 184micrograms/dose / Vilanterol 22micrograms/dose dry powder inhaler>, <Presentation: 0301011Y0AAABAB: Fluticasone furoate 92micrograms/dose / Vilanterol 22micrograms/dose dry powder inhaler>, <Presentation: 0301011Y0BBAAAA: Relvar Ellipta 184micrograms/dose / 22micrograms/dose dry powder inhaler>, <Presentation: 0301011Y0BBABAB: Relvar Ellipta 92micrograms/dose / 22micrograms/dose dry powder inhaler>, <Presentation: 030902000BECEA0: Boots Catarrh Cough syrup>, <Presentation: 040101000BBADA0: Bio-Melatonin 3mg tablets>, <Presentation: 040101000BBAHA0: HealthAid Melatonin 3mg tablets>, <Presentation: 0407010T0BBAABP: Veganin tablets>, <Presentation: 0410000B0BBAKBB: Nicorette Citrus 2mg medicated chewing gum>, <Presentation: 0410000B0BBALBE: Nicorette Citrus 4mg medicated chewing gum>, <Presentation: 0410000B0BBAPBK: Nicorette Freshmint 2mg medicated chewing gum>, <Presentation: 0410000B0BBAQBL: Nicorette Freshmint 4mg medicated chewing gum>, <Presentation: 0410000B0BBARBM: Nicorette Fruitfusion 2mg medicated chewing gum>, <Presentation: 0410000B0BBASBN: Nicorette Fruitfusion 4mg medicated chewing gum>, '...(remaining elements truncated)...']>)

Will have to pick this up afresh tomorrow.

evansd commented 2 years ago

The problem turned out to be here: https://github.com/ebmdatalab/openprescribing/blob/57d217d788dc206be850f849ddd2d1d664754d92/openprescribing/frontend/management/commands/generate_presentation_replacements.py#L114

We had some presentations where their replaced_by field referenced a presentation which was itself replaced by some other presentation. Deleting all of these together shouldn't involve an integrity error but Django's emulation of integrity constraints in Python can't handle this properly and so throws an error.

My workaround was just to perform the equivalent delete directly in the psql console and then run the command which then completed successfully.

I'm still not closing this ticket because we need to re-import (at least some of) the data. This means rebuilding the matrixstore prescribing file using the new presentation replacements. And possibly re-importing the measures if any of them are affected (@richiecroker is checking this).

evansd commented 2 years ago

I've now rebuilt the MatrixStore file using the updated BNF code replacements and deployed it.

I've also generated a list of all BNF codes used in all measures, using this script:

from collections import defaultdict
from frontend.models import Measure

bnf_codes = defaultdict(set)

for measure in Measure.objects.all():
    for c in measure.numerator_bnf_codes or ():
        bnf_codes[c].add(measure.id)
    for c in measure.denominator_bnf_codes or ():
        bnf_codes[c].add(measure.id)

for code, ids in sorted(bnf_codes.items()):
    print(f"{code},{','.join(sorted(ids))}")

@richiecroker is working on assessing the impact of the code changes on any measures and will amend and re-import the measures as needed.

inglesp commented 1 year ago

For an assessment on the impact on measures, see #3746. Several are affected, and for expediency we're reimporting all of them.

sebbacon commented 9 months ago

Upading this issue for future archaeologists

The mapping emails were originally supplied in August 2016, by a Graham Mitchell from NHSBSA.

In June 2017 we were given a contact Margaret Dockey, Prescription Information Services Manager who provided the latest spreadsheet entitled DRUG_SNOMED_BNF_20170714.xlsx. In October she provided an updated entitled Converted Snomed - BNF.XLSX