Closed richiecroker closed 1 year ago
I think there's going to be some quite serious software archaeology required to understand what's needed here.
The comments here suggest that the original files were supplied privately via email and than manually corrected: https://github.com/ebmdatalab/openprescribing/blob/dfd7e4f92dbac10739203027d90dafb577abdfd3/openprescribing/frontend/management/commands/generate_presentation_replacements.py#L30-L32
But then they also suggest that 2017 is the last year they should be needed, which is obviously not the case: https://github.com/ebmdatalab/openprescribing/blob/dfd7e4f92dbac10739203027d90dafb577abdfd3/openprescribing/frontend/management/commands/generate_presentation_replacements.py#L75-L77
There's an open issue here on the question of exactly how these changes are supposed to work:
I think it would be reasonably straightforward to generate 2021.txt
and 2022.txt
files from the XLSX files linked above. And then assuming that the generate_presentation_replacements
command still works it should be possible to import these in the same way as the existing replacements.
I've no idea how we'd go about obtaining the missing 2016 or 2019 data though — this predates even my tenure here.
I think @inglesp and @sebbacon are going to need to go email spelunking to find out how we obtained the 2018 file from NHSBSA. (It was added by Peter on 14 Aug 2018 if that helps narrow things down.)
Possibly the same process can be used to get the missing 2019 and 2016 files.
Have forwarded some breadcrumbs by email
@evansd I'm not sure there are any 2016 and 2019 changes - there are no BNF codes with a COUNT>1
in the search that I did.
The 2017 changes are on the BSA website here
I'm not sure there are any 2016 and 2019 changes - there are no BNF codes with a
COUNT>1
in the search that I did.
Ahhh, right — sorry! I think I totally misunderstood what you were getting at here. You had the line "we appear to have a number of missing years" followed by a list of year headings so I just assumed these were all missing years. But if I've understood correctly 2021 and 2022 are the only actually missing ones, and both of those are available for download already. Is that correct?
If that's the case then @inglesp and @sebbacon can stand down. I can just try to use the existing command with the new data and see what happens.
Sorry @evansd , yes I started here and worked through, which is why I said several years missing - but I don't think there were any changes in those two years.
Yes, that makes sense now. I just ended up down the wrong garden path and then confused myself!
Sigh
I've deployed the PR with the new files and first tried running the generate_presentation_replacements
as the ebmbot
user. That got me a permissions error:
PermissionError: [Errno 13] Permission denied: '//mnt/database/tmp/unused_codes_section_code.csv'
That file (and a few others like it) are all owned by hello
so I tried running as that user. That then died with a protected foreign key error:
ProtectedError: ("Cannot delete some instances of model 'Presentation' because they are referenced through a protected foreign key: 'Presentation.replaced_by'", <QuerySet [<Presentation: 0101021B0BEAKAL: Gaviscon Advance oral suspension peppermint>, <Presentation: 0105000B0BDAFAN: Salofalk 1g gastro-resistant modified-release granules sachets>, <Presentation: 0106040M0BBAEAF: Movicol Chocolate oral powder 13.9g sachets>, <Presentation: 0206010F0BBABAA: Coro-Nitro 400micrograms/dose pump sublingual spray>, <Presentation: 0206010F0BFABAA: Nitrolingual 400micrograms/dose pump sublingual spray>, <Presentation: 0206010F0BRABAA: Glytrin 400micrograms/dose aerosol sublingual spray>, <Presentation: 0301011Y0AAAAAA: Fluticasone furoate 184micrograms/dose / Vilanterol 22micrograms/dose dry powder inhaler>, <Presentation: 0301011Y0AAABAB: Fluticasone furoate 92micrograms/dose / Vilanterol 22micrograms/dose dry powder inhaler>, <Presentation: 0301011Y0BBAAAA: Relvar Ellipta 184micrograms/dose / 22micrograms/dose dry powder inhaler>, <Presentation: 0301011Y0BBABAB: Relvar Ellipta 92micrograms/dose / 22micrograms/dose dry powder inhaler>, <Presentation: 030902000BECEA0: Boots Catarrh Cough syrup>, <Presentation: 040101000BBADA0: Bio-Melatonin 3mg tablets>, <Presentation: 040101000BBAHA0: HealthAid Melatonin 3mg tablets>, <Presentation: 0407010T0BBAABP: Veganin tablets>, <Presentation: 0410000B0BBAKBB: Nicorette Citrus 2mg medicated chewing gum>, <Presentation: 0410000B0BBALBE: Nicorette Citrus 4mg medicated chewing gum>, <Presentation: 0410000B0BBAPBK: Nicorette Freshmint 2mg medicated chewing gum>, <Presentation: 0410000B0BBAQBL: Nicorette Freshmint 4mg medicated chewing gum>, <Presentation: 0410000B0BBARBM: Nicorette Fruitfusion 2mg medicated chewing gum>, <Presentation: 0410000B0BBASBN: Nicorette Fruitfusion 4mg medicated chewing gum>, '...(remaining elements truncated)...']>)
Will have to pick this up afresh tomorrow.
The problem turned out to be here: https://github.com/ebmdatalab/openprescribing/blob/57d217d788dc206be850f849ddd2d1d664754d92/openprescribing/frontend/management/commands/generate_presentation_replacements.py#L114
We had some presentations where their replaced_by
field referenced a presentation which was itself replaced by some other presentation. Deleting all of these together shouldn't involve an integrity error but Django's emulation of integrity constraints in Python can't handle this properly and so throws an error.
My workaround was just to perform the equivalent delete directly in the psql
console and then run the command which then completed successfully.
I'm still not closing this ticket because we need to re-import (at least some of) the data. This means rebuilding the matrixstore prescribing file using the new presentation replacements. And possibly re-importing the measures if any of them are affected (@richiecroker is checking this).
I've now rebuilt the MatrixStore file using the updated BNF code replacements and deployed it.
I've also generated a list of all BNF codes used in all measures, using this script:
from collections import defaultdict
from frontend.models import Measure
bnf_codes = defaultdict(set)
for measure in Measure.objects.all():
for c in measure.numerator_bnf_codes or ():
bnf_codes[c].add(measure.id)
for c in measure.denominator_bnf_codes or ():
bnf_codes[c].add(measure.id)
for code, ids in sorted(bnf_codes.items()):
print(f"{code},{','.join(sorted(ids))}")
@richiecroker is working on assessing the impact of the code changes on any measures and will amend and re-import the measures as needed.
For an assessment on the impact on measures, see #3746. Several are affected, and for expediency we're reimporting all of them.
Upading this issue for future archaeologists
The mapping emails were originally supplied in August 2016, by a Graham Mitchell from NHSBSA.
In June 2017 we were given a contact Margaret Dockey, Prescription Information Services Manager who provided the latest spreadsheet entitled DRUG_SNOMED_BNF_20170714.xlsx
. In October she provided an updated entitled Converted Snomed - BNF.XLSX
The BSA provide code maps on an annual basis to map where BNF code changes. We have a list of what we have used here.
However we appear to have a number of missing years:
2016
Using the code below suggests there are no missing BNF code changes between December 2015 and January 2016 data
2019
Using similar code to above, with
WHERE month BETWEEN '2018-12-01' AND '2019-01-01'
also returns no data.2021
NHSBSA have released a BNF code change map for 2021 here. Running the code above suggests that this file hasn't been imported.
2022
NHSBSA have released a BNF code change map for 2022 here. Running the code above suggests that this file hasn't been imported.
Can we import these as soon as possible to ensure that there aren't issues with data quality please. Also it's likely the 2023 map will be out in February 2023, so we need to check then for any code change maps.