ebmdatalab / prescribing-queries

Documentation about common queries against the prescribing dataset
GNU General Public License v3.0
3 stars 3 forks source link

How can we use ADQs #31

Open HelenCEBM opened 7 years ago

HelenCEBM commented 7 years ago

Back Calculation of ADQ from New Prescribing Dataset

ADQ usage values are supplied in the new dataset, but only 2 years is currently available. However, we can use the data to carry out a back-calculation of ADQs which we can then apply to all of our prescribing data.

How is the ADQ_Usage field calculated and how can we calculate ADQs from it?

Example of a conflict (Sachets) - Mebeverine & ispaghula (0102000X0AAAAAA)

According to NHS Digital the ADQ for this item is two sachets: image

Back calculation using the new prescribing dataset gives us proxy ADQ of 0.5, as expected; i.e. the listed item at the given dose size (one sachet / 3.5g) is half of what one would expect to take per day.

Note - the quantity prescribed is number of sachets: image

However, comparing this with the presentation dataset (last four columns below), although the adq is correct at 2, the active quantity is 3.0. Although this sounds like it could be 3 sachets, actually the code deriving this will have taken the 3 from 3.5g, hence is meaningless. From this the percent of adq is calculated as 1.5, which appears correct but since the 3 is meaningless, this is not a helpful value: image

So in this case, the ADQ found in the presentation data is not useful and the back calculation is a better method of obtaining an ADQ we can use.

Solution Example - Dicycloverine (0102000J0AAAAAA)

This drug is in solution of 10mg/5ml, where the quantity will be given in ml but the active ingredient (and also the ADQ) is in mg. image Above, the ADQ from presentation is technically correct, because one dose as listed in the bnf description is indeed one third of a daily quantity. However, this is not useful, because number of items is simply the number of bottles, not single doses, and the quantity will always as some volume in ml.

The number of items/ bottles can never translate to any ADQ because they can be different sizes (hence different number of doses). E.g. 100ml / 120ml: image Here, 1 item of 120ml represents 240mg of the active ingredient, hence 240/30 = 8 ADQs. Therefore, provided we use quantity (multiplied by items) then our proxy ADQ is correct.

Tablets Example - Co-Phenotrope (0104020H0AAAAAA)

The ADQ for this drug is 8 tablets: image The active quantity appears to be 2 tablets according to presentation, but as in first example, this will have been derived from the number appearing before the decimal point of the 2.5mg, so this (and hence also the percent of adq) is meaningless: image

In our data, quantity is given in number of tablets (not packets) so we will be able to make use of the Proxy ADQ in relation to the quantity prescribed.

Next steps:

sebbacon commented 7 years ago

Thanks @HelenCEBM, this is a great analysis.

What is shows is that our current presentation dataset is simply wrong. I've gone and checked the logic, and it is buggy. It attempts to infer the active_quantity by scanning the BNF description for numbers preceding the adq_unit.

First, there is a bug in that it stops at the decimal point (this is why active_quantity in your Mebeverine & ispaghula example is 3).

Second, the unit is sometimes blank (for things like "sachets"), so the logic of looking for a number in the description breaks down.

Conclusion: there are definitely errors in the current ADQs as computed in presentation and we should replace that with the new data. I think we should just drop it completely.

The question then arises about what gaps we'll be left with. I believe at the moment we only use ADQs in two measures (KTT10 and KTT11) (but we should check with Rich and Ben) which I expect will be covered in the new data - the BNF codes for these are:

WHERE ((bnf_code='0501130R0BBAAAD')  OR (bnf_code='0501130R0BGAAAG')  OR (bnf_code='0501130R0BCAAAA')  OR (bnf_code='0501130R0AAAGAG')  OR (bnf_code='0501130R0AAAAAA')  OR (bnf_code='0501130R0AAADAD')  OR (bnf_code='0501015P0AAABAB')  OR (bnf_code='0501015P0BBABAB')  OR (bnf_code='0501080W0AAAEAE'))

and

WHERE (p.bnf_code LIKE '0501030P0%')

It would also be interesting to see if the current calculations are currently wrong for any of these.

Then, as you say, we can do a gap analysis against the trends data.

HelenCEBM commented 7 years ago

Gaps

It seems that wherever an ADQ value was available it has been applied in the new dataset. Hence the only gaps in our calculated ADQs were for items not prescribed in September's data.

image

Measures

Checking the BNF codes for KTT10 and 11 from Seb's post it turns out that the old and new ADQs match. (Note, however, some codes may be missed as this only applies to those which appeared in September's data).

image

These items probably worked fine via the old method because the active quantity in the bnf description is a whole number, so the code would have extracted the correct amount.

For KTT11, the new method adds a previously unknown ADQ, though the item in question only has a small amount of prescribing. image

HelenCEBM commented 7 years ago

Gaps (continued)

FROM ebmdatalab.aggregated_data.all_prescribed_BNFs_UpToSept2016 a LEFT JOIN ebmdatalab.tmp_eu.compiled_ADQs b ON a.bnf_code = b.bnf_code LEFT JOIN ebmdatalab.tmp_eu.compiled_ADQs c ON a.generic_presentation = c.bnf_code LEFT JOIN ebmdatalab.hscic.presentation d ON a.bnf_code = d.bnf_code

WHERE a.is_generic = 0 AND b.ADQ IS NULL -- Branded has no ADQ AND c.ADQ IS NOT NULL -- Generic equivalent has ADQ AND SUBSTR(a.bnf_code,1,2) < '20'

This pulls out all branded products lacking ADQs where the generic equivalent has a known ADQ. 
Or, working the other way around, we can find all branded products **with** an ADQ where the generic equivalent has **no** ADQ.

WHERE a.is_generic = 0 -- Branded only AND c.ADQ IS NULL -- Generic has no ADQ AND b.ADQ is not null -- Branded has an ADQ AND SUBSTR(a.bnf_code,1,2) < '20'


This gives us 
- 28 ADQs for **generics** based upon the **branded** version
- 704 ADQs for **brands** based upon **generic** equivalents. This leaves 8,049 non-generic items without ADQs (excluding chapters 20+ and everything that doesn't exist in our main dataset).

We can check how many ADQs are missing compared to how much each drug is prescribed. 

SELECT SUBSTR(a.bnf_code,1,2) AS Chapter, BNF.Description AS Chapter_Name, a.bnf_code, p.name AS BNF_desc, a.is_generic, a.items, a.quantity, b.ADQ, a.quantity*b.ADQ AS ADQ_usage FROM ebmdatalab.aggregated_data.all_prescribed_BNFs_UpToSept2016 a -- this contains total prescribed quantities since start of data LEFT JOIN ebmdatalab.hscic.compiled_ADQs_HC b on a.bnf_code = b.bnf_code LEFT JOIN ebmdatalab.hscic.bnf_vertical bnf ON SUBSTR(a.bnf_code,1,2) = bnf.code LEFT JOIN ebmdatalab.hscic.presentation p ON a.bnf_code = p.bnf_code ORDER BY bnf_code


For chapters 1-5 and 10, this gives us:
![image](https://cloud.githubusercontent.com/assets/24392156/21361348/89503e44-c6db-11e6-94a5-2686be82dfc1.png)

So our ADQs cover e.g. only 51% of the BNF codes in the cardiovascular chapter, but these accounts for 90% of the total quantity of prescribing in this chapter.

There are 462 chemicals that have an ADQ listed in ePACT. 
Using a lookup in Excel, it appears that there is only one gap compared to the list of chemicals with ADQs. This may be because the presentation which has an ADQ hasn't been prescribed (and the presentation that has been prescribed has no ADQ).