How can we use ADQs - Githubissues

Back Calculation of ADQ from New Prescribing Dataset

ADQ usage values are supplied in the new dataset, but only 2 years is currently available. However, we can use the data to carry out a back-calculation of ADQs which we can then apply to all of our prescribing data.

This fills in some ADQs that are missing from the presentation dataset (Note - conversely, some may be provided in presentation but not in the new dataset - still need to capture these).
The dose size is already taken into account in the dataset, so we can avoid having to extract the dose size from the bnf description for every item.
Instead we calculate a (possibly more useful) Proxy ADQ which represents the ADQ per prescribed quantity (this may be tablets/ml/items etc).
The actual ADQ may be given in different units to the prescribing method, see the examples below.
This proxy ADQ is equivalent to the percent of adq which is incompletely supplied in the presentation dataset (except where units are different).
This method produces some conflicts with the presentation dataset which are investigated in the examples below.

How is the `ADQ_Usage` field calculated and how can we calculate ADQs from it?

ADQ Usage represents the following calculation: Items*Quantity*ADQ.
So we can calculate an ADQ simply by dividing this field by both items and quantity.
Note: the new set of data is more granular than the old - the existing HSCIC dataset only has a total quantity, which is the sum of all the quantity*items for that presentation, whereas the newer set has the number of items for each different quantity.
So, when we apply the ADQs to our existing dataset, it will only be meaningful if we use quantity.
Remember - the items field alone does not tell us how many doses/packets/bottles are prescribed, only how many times the item appeared on prescription forms.

Example of a conflict (Sachets) - Mebeverine & ispaghula (0102000X0AAAAAA)

According to NHS Digital the ADQ for this item is two sachets:

Back calculation using the new prescribing dataset gives us proxy ADQ of 0.5, as expected; i.e. the listed item at the given dose size (one sachet / 3.5g) is half of what one would expect to take per day.

Note - the quantity prescribed is number of sachets:

However, comparing this with the presentation dataset (last four columns below), although the adq is correct at 2, the active quantity is 3.0. Although this sounds like it could be 3 sachets, actually the code deriving this will have taken the 3 from 3.5g, hence is meaningless. From this the percent of adq is calculated as 1.5, which appears correct but since the 3 is meaningless, this is not a helpful value:

So in this case, the ADQ found in the presentation data is not useful and the back calculation is a better method of obtaining an ADQ we can use.

Solution Example - Dicycloverine (0102000J0AAAAAA)

This drug is in solution of 10mg/5ml, where the quantity will be given in ml but the active ingredient (and also the ADQ) is in mg. Above, the ADQ from presentation is technically correct, because one dose as listed in the bnf description is indeed one third of a daily quantity. However, this is not useful, because number of items is simply the number of bottles, not single doses, and the quantity will always as some volume in ml.

The number of items/ bottles can never translate to any ADQ because they can be different sizes (hence different number of doses). E.g. 100ml / 120ml: Here, 1 item of 120ml represents 240mg of the active ingredient, hence 240/30 = 8 ADQs. Therefore, provided we use quantity (multiplied by items) then our proxy ADQ is correct.

Tablets Example - Co-Phenotrope (0104020H0AAAAAA)

The ADQ for this drug is 8 tablets: The active quantity appears to be 2 tablets according to presentation, but as in first example, this will have been derived from the number appearing before the decimal point of the 2.5mg, so this (and hence also the percent of adq) is meaningless:

In our data, quantity is given in number of tablets (not packets) so we will be able to make use of the Proxy ADQ in relation to the quantity prescribed.

Next steps:

[ ] Decide whether we want to capture ADQs provided in presentation but not in the new dataset (how do we decide if they relate to prescribed quantities or not? If they haven't been prescribed in a whole month then do we need them anyway?)
[ ] Link to trends data. How many gaps do we have?

Thanks @HelenCEBM, this is a great analysis.

What is shows is that our current presentation dataset is simply wrong. I've gone and checked the logic, and it is buggy. It attempts to infer the active_quantity by scanning the BNF description for numbers preceding the adq_unit.

First, there is a bug in that it stops at the decimal point (this is why active_quantity in your Mebeverine & ispaghula example is 3).

Second, the unit is sometimes blank (for things like "sachets"), so the logic of looking for a number in the description breaks down.

Conclusion: there are definitely errors in the current ADQs as computed in presentation and we should replace that with the new data. I think we should just drop it completely.

The question then arises about what gaps we'll be left with. I believe at the moment we only use ADQs in two measures (KTT10 and KTT11) (but we should check with Rich and Ben) which I expect will be covered in the new data - the BNF codes for these are:

WHERE ((bnf_code='0501130R0BBAAAD')  OR (bnf_code='0501130R0BGAAAG')  OR (bnf_code='0501130R0BCAAAA')  OR (bnf_code='0501130R0AAAGAG')  OR (bnf_code='0501130R0AAAAAA')  OR (bnf_code='0501130R0AAADAD')  OR (bnf_code='0501015P0AAABAB')  OR (bnf_code='0501015P0BBABAB')  OR (bnf_code='0501080W0AAAEAE'))

and

WHERE (p.bnf_code LIKE '0501030P0%')

It would also be interesting to see if the current calculations are currently wrong for any of these.

Then, as you say, we can do a gap analysis against the trends data.

Gaps

It seems that wherever an ADQ value was available it has been applied in the new dataset. Hence the only gaps in our calculated ADQs were for items not prescribed in September's data.

[ ] However, it would be useful to obtain the previous months/years data in order to ensure there are no gaps in previous data.

Measures

Checking the BNF codes for KTT10 and 11 from Seb's post it turns out that the old and new ADQs match. (Note, however, some codes may be missed as this only applies to those which appeared in September's data).

These items probably worked fine via the old method because the active quantity in the bnf description is a whole number, so the code would have extracted the correct amount.

For KTT11, the new method adds a previously unknown ADQ, though the item in question only has a small amount of prescribing.

Gaps (continued)

From the above, we have ADQs for 4,032 different BNF codes in our compiled list (calculated ADQs plus extras from Rich's system). We can also fill in gaps for some BNF codes based upon generic equivalents:
```
SELECT  a.*,
d.name as bnf_description,
b.ADQ,
c.ADQ AS Generic_ADQ,
c.bnf_description AS generic_description
```

FROM ebmdatalab.aggregated_data.all_prescribed_BNFs_UpToSept2016 a LEFT JOIN ebmdatalab.tmp_eu.compiled_ADQs b ON a.bnf_code = b.bnf_code LEFT JOIN ebmdatalab.tmp_eu.compiled_ADQs c ON a.generic_presentation = c.bnf_code LEFT JOIN ebmdatalab.hscic.presentation d ON a.bnf_code = d.bnf_code

WHERE a.is_generic = 0 AND b.ADQ IS NULL -- Branded has no ADQ AND c.ADQ IS NOT NULL -- Generic equivalent has ADQ AND SUBSTR(a.bnf_code,1,2) < '20'

This pulls out all branded products lacking ADQs where the generic equivalent has a known ADQ. 
Or, working the other way around, we can find all branded products **with** an ADQ where the generic equivalent has **no** ADQ.

WHERE a.is_generic = 0 -- Branded only AND c.ADQ IS NULL -- Generic has no ADQ AND b.ADQ is not null -- Branded has an ADQ AND SUBSTR(a.bnf_code,1,2) < '20'


This gives us 
- 28 ADQs for **generics** based upon the **branded** version
- 704 ADQs for **brands** based upon **generic** equivalents. This leaves 8,049 non-generic items without ADQs (excluding chapters 20+ and everything that doesn't exist in our main dataset).

We can check how many ADQs are missing compared to how much each drug is prescribed.

SELECT SUBSTR(a.bnf_code,1,2) AS Chapter, BNF.Description AS Chapter_Name, a.bnf_code, p.name AS BNF_desc, a.is_generic, a.items, a.quantity, b.ADQ, a.quantity*b.ADQ AS ADQ_usage FROM ebmdatalab.aggregated_data.all_prescribed_BNFs_UpToSept2016 a -- this contains total prescribed quantities since start of data LEFT JOIN ebmdatalab.hscic.compiled_ADQs_HC b on a.bnf_code = b.bnf_code LEFT JOIN ebmdatalab.hscic.bnf_vertical bnf ON SUBSTR(a.bnf_code,1,2) = bnf.code LEFT JOIN ebmdatalab.hscic.presentation p ON a.bnf_code = p.bnf_code ORDER BY bnf_code


For chapters 1-5 and 10, this gives us:
![image](https://cloud.githubusercontent.com/assets/24392156/21361348/89503e44-c6db-11e6-94a5-2686be82dfc1.png)

So our ADQs cover e.g. only 51% of the BNF codes in the cardiovascular chapter, but these accounts for 90% of the total quantity of prescribing in this chapter.

There are 462 chemicals that have an ADQ listed in ePACT. 
Using a lookup in Excel, it appears that there is only one gap compared to the list of chemicals with ADQs. This may be because the presentation which has an ADQ hasn't been prescribed (and the presentation that has been prescribed has no ADQ).

ebmdatalab / prescribing-queries

How can we use ADQs #31

Back Calculation of ADQ from New Prescribing Dataset

How is the `ADQ_Usage` field calculated and how can we calculate ADQs from it?

Example of a conflict (Sachets) - Mebeverine & ispaghula (0102000X0AAAAAA)

Solution Example - Dicycloverine (0102000J0AAAAAA)

Tablets Example - Co-Phenotrope (0104020H0AAAAAA)

Next steps:

Gaps

Measures

Gaps (continued)

ebmdatalab / prescribing-queries

How can we use ADQs #31

Back Calculation of ADQ from New Prescribing Dataset

How is the ADQ_Usage field calculated and how can we calculate ADQs from it?

Example of a conflict (Sachets) - Mebeverine & ispaghula (0102000X0AAAAAA)

Solution Example - Dicycloverine (0102000J0AAAAAA)

Tablets Example - Co-Phenotrope (0104020H0AAAAAA)

Next steps:

Gaps

Measures

Gaps (continued)

How is the `ADQ_Usage` field calculated and how can we calculate ADQs from it?