Open HelenCEBM opened 7 years ago
Thanks @HelenCEBM, this is a great analysis.
What is shows is that our current presentation
dataset is simply wrong. I've gone and checked the logic, and it is buggy. It attempts to infer the active_quantity
by scanning the BNF description for numbers preceding the adq_unit
.
First, there is a bug in that it stops at the decimal point (this is why active_quantity
in your Mebeverine & ispaghula example is 3
).
Second, the unit is sometimes blank (for things like "sachets"), so the logic of looking for a number in the description breaks down.
Conclusion: there are definitely errors in the current ADQs as computed in presentation
and we should replace that with the new data. I think we should just drop it completely.
The question then arises about what gaps we'll be left with. I believe at the moment we only use ADQs in two measures (KTT10 and KTT11) (but we should check with Rich and Ben) which I expect will be covered in the new data - the BNF codes for these are:
WHERE ((bnf_code='0501130R0BBAAAD') OR (bnf_code='0501130R0BGAAAG') OR (bnf_code='0501130R0BCAAAA') OR (bnf_code='0501130R0AAAGAG') OR (bnf_code='0501130R0AAAAAA') OR (bnf_code='0501130R0AAADAD') OR (bnf_code='0501015P0AAABAB') OR (bnf_code='0501015P0BBABAB') OR (bnf_code='0501080W0AAAEAE'))
and
WHERE (p.bnf_code LIKE '0501030P0%')
It would also be interesting to see if the current calculations are currently wrong for any of these.
Then, as you say, we can do a gap analysis against the trends data.
It seems that wherever an ADQ value was available it has been applied in the new dataset. Hence the only gaps in our calculated ADQs were for items not prescribed in September's data.
Checking the BNF codes for KTT10 and 11 from Seb's post it turns out that the old and new ADQs match. (Note, however, some codes may be missed as this only applies to those which appeared in September's data).
These items probably worked fine via the old method because the active quantity in the bnf description
is a whole number, so the code would have extracted the correct amount.
For KTT11, the new method adds a previously unknown ADQ, though the item in question only has a small amount of prescribing.
SELECT a.*,
d.name as bnf_description,
b.ADQ,
c.ADQ AS Generic_ADQ,
c.bnf_description AS generic_description
FROM ebmdatalab.aggregated_data.all_prescribed_BNFs_UpToSept2016 a LEFT JOIN ebmdatalab.tmp_eu.compiled_ADQs b ON a.bnf_code = b.bnf_code LEFT JOIN ebmdatalab.tmp_eu.compiled_ADQs c ON a.generic_presentation = c.bnf_code LEFT JOIN ebmdatalab.hscic.presentation d ON a.bnf_code = d.bnf_code
WHERE a.is_generic = 0 AND b.ADQ IS NULL -- Branded has no ADQ AND c.ADQ IS NOT NULL -- Generic equivalent has ADQ AND SUBSTR(a.bnf_code,1,2) < '20'
This pulls out all branded products lacking ADQs where the generic equivalent has a known ADQ.
Or, working the other way around, we can find all branded products **with** an ADQ where the generic equivalent has **no** ADQ.
WHERE a.is_generic = 0 -- Branded only AND c.ADQ IS NULL -- Generic has no ADQ AND b.ADQ is not null -- Branded has an ADQ AND SUBSTR(a.bnf_code,1,2) < '20'
This gives us
- 28 ADQs for **generics** based upon the **branded** version
- 704 ADQs for **brands** based upon **generic** equivalents. This leaves 8,049 non-generic items without ADQs (excluding chapters 20+ and everything that doesn't exist in our main dataset).
We can check how many ADQs are missing compared to how much each drug is prescribed.
SELECT SUBSTR(a.bnf_code,1,2) AS Chapter, BNF.Description AS Chapter_Name, a.bnf_code, p.name AS BNF_desc, a.is_generic, a.items, a.quantity, b.ADQ, a.quantity*b.ADQ AS ADQ_usage FROM ebmdatalab.aggregated_data.all_prescribed_BNFs_UpToSept2016 a -- this contains total prescribed quantities since start of data LEFT JOIN ebmdatalab.hscic.compiled_ADQs_HC b on a.bnf_code = b.bnf_code LEFT JOIN ebmdatalab.hscic.bnf_vertical bnf ON SUBSTR(a.bnf_code,1,2) = bnf.code LEFT JOIN ebmdatalab.hscic.presentation p ON a.bnf_code = p.bnf_code ORDER BY bnf_code
For chapters 1-5 and 10, this gives us:
![image](https://cloud.githubusercontent.com/assets/24392156/21361348/89503e44-c6db-11e6-94a5-2686be82dfc1.png)
So our ADQs cover e.g. only 51% of the BNF codes in the cardiovascular chapter, but these accounts for 90% of the total quantity of prescribing in this chapter.
There are 462 chemicals that have an ADQ listed in ePACT.
Using a lookup in Excel, it appears that there is only one gap compared to the list of chemicals with ADQs. This may be because the presentation which has an ADQ hasn't been prescribed (and the presentation that has been prescribed has no ADQ).
Back Calculation of ADQ from New Prescribing Dataset
ADQ usage
values are supplied in the new dataset, but only 2 years is currently available. However, we can use the data to carry out a back-calculation of ADQs which we can then apply to all of our prescribing data.presentation
dataset (Note - conversely, some may be provided inpresentation
but not in the new dataset - still need to capture these).bnf description
for every item.Proxy ADQ
which represents the ADQ per prescribed quantity (this may be tablets/ml/items etc).proxy ADQ
is equivalent to thepercent of adq
which is incompletely supplied in thepresentation
dataset (except where units are different).presentation
dataset which are investigated in the examples below.How is the
ADQ_Usage
field calculated and how can we calculate ADQs from it?ADQ Usage
represents the following calculation:Items*Quantity*ADQ
.items
andquantity
.total quantity
, which is the sum of all thequantity*items
for that presentation, whereas the newer set has the number of items for each different quantity.quantity
.items
field alone does not tell us how many doses/packets/bottles are prescribed, only how many times the item appeared on prescription forms.Example of a conflict (Sachets) - Mebeverine & ispaghula (0102000X0AAAAAA)
According to NHS Digital the ADQ for this item is two sachets:
Back calculation using the new prescribing dataset gives us
proxy ADQ
of0.5
, as expected; i.e. the listed item at the given dose size (one sachet / 3.5g) is half of what one would expect to take per day.Note - the
quantity
prescribed is number of sachets:However, comparing this with the
presentation
dataset (last four columns below), although theadq
is correct at2
, theactive quantity
is3.0
. Although this sounds like it could be 3 sachets, actually the code deriving this will have taken the3
from3.5g
, hence is meaningless. From this thepercent of adq
is calculated as1.5
, which appears correct but since the3
is meaningless, this is not a helpful value:So in this case, the ADQ found in the
presentation
data is not useful and the back calculation is a better method of obtaining an ADQ we can use.Solution Example - Dicycloverine (0102000J0AAAAAA)
This drug is in solution of
10mg/5ml
, where thequantity
will be given in ml but the active ingredient (and also the ADQ) is in mg. Above, the ADQ frompresentation
is technically correct, because one dose as listed in thebnf description
is indeed one third of a daily quantity. However, this is not useful, because number ofitems
is simply the number of bottles, not single doses, and thequantity
will always as some volume in ml.The number of
items
/ bottles can never translate to any ADQ because they can be different sizes (hence different number of doses). E.g. 100ml / 120ml: Here, 1 item of 120ml represents 240mg of the active ingredient, hence240/30 = 8
ADQs. Therefore, provided we usequantity
(multiplied byitems
) then our proxy ADQ is correct.Tablets Example - Co-Phenotrope (0104020H0AAAAAA)
The ADQ for this drug is 8 tablets: The
active quantity
appears to be 2 tablets according topresentation
, but as in first example, this will have been derived from the number appearing before the decimal point of the2.5mg
, so this (and hence also thepercent of adq
) is meaningless:In our data,
quantity
is given in number of tablets (not packets) so we will be able to make use of theProxy ADQ
in relation to the quantity prescribed.Next steps:
presentation
but not in the new dataset (how do we decide if they relate to prescribed quantities or not? If they haven't been prescribed in a whole month then do we need them anyway?)