Closed sebbacon closed 7 years ago
A potential method for testing for multimodality is using the Mclust function in R which tries to fit one or more normal distributions to the data, and should tell us how many "clusters" we have. It is described here: http://stats.stackexchange.com/questions/138223/how-to-test-if-my-distribution-is-multimodal
As another option is to try the Shapiro-Wilks test for normality (i.e. test for single peak)... https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test It should give us a p-value, where low indicates not-normal - would have to see what level of significance suits our data. Seems to be popular, provided n < 5000. Here is what appears to be instructions for doing this in R (though I'm not familiar with R!) http://stat.ethz.ch/R-manual/R-devel/library/stats/html/shapiro.test.html
Type 1: To map AMPP: Use the BNF - dm+d map provided by the BSA, with the dm+d AMPP information available from here: https://isd.hscic.gov.uk/trud3/user/guest/group/0/pack/6/subpack/24/releases
(comment combined with ones below)
Here's the link to the Google Sheet for respiratory system drugs.
Here is some SQL (created with @HelenCEBM - thanks Helen) to identify where there are outliers in the type of situation we have found ourselves in with the glycopyrronium. It looks at the ratio between the NIC/(quantity*items)
for both MIN(quantity)
and MAX(quantity)
. For something like glycopyrronium, this ratio should be 30, due to the difference between calculating per "inhaler" and per dose. This works in this SQL.
It also identifies whether it is a branded preparation, and if not, whether it is Cat A, C or M in the Drug Tariff, or whether it is NP8.
WITH
maxmin AS (
SELECT
bnf_code,
MAX(Quantity) AS maxQ,
MIN(Quantity) AS minQ
FROM
ebmdatalab.hscic.NEW_prescribing
WHERE
bnf_code NOT LIKE '0905%'
AND bnf_code NOT LIKE '0904%'
AND bnf_code NOT LIKE '0910%'
AND bnf_code NOT LIKE '19%'
AND bnf_code NOT LIKE '18%'
AND bnf_code NOT LIKE '1315%'
AND bnf_code NOT LIKE '120101020%'
AND bnf_code NOT LIKE '20%'
AND bnf_code NOT LIKE '21%'
AND bnf_code NOT LIKE '22%'
AND bnf_code NOT LIKE '23%'
GROUP BY
bnf_code)
SELECT
a.bnf_code,
pMAX.bnf_description,
IFNULL(TAR.category,
(IF(SUBSTR(a.bnf_code,10,2)='AA',
'NP8',
"Brand"))) AS DT_category,
a.maxQ AS max_quantity,
SUM(pMAX.items) AS max_items,
SUM(pMAX.quantity*pMAX.items) AS max_QI,
SUM (pMAX.NIC) AS max_NIC,
IEEE_DIVIDE(SUM(pMAX.NIC),SUM(pMAX.quantity*pMAX.items)) AS max_NICperQI,
a.minQ AS min_quantity,
SUM(pMIN.items) AS min_items,
SUM(pMIN.quantity*pMIN.items) AS min_QI,
SUM (pMIN.NIC) AS min_NIC,
IEEE_DIVIDE(SUM(pMIN.NIC),SUM(pMIN.quantity*pMIN.items)) AS min_NICperQI,
ROUND(IEEE_DIVIDE( IEEE_DIVIDE(SUM(pMAX.NIC),SUM(pMAX.quantity*pMAX.items)), IEEE_DIVIDE(SUM(pMIN.NIC),SUM(pMIN.quantity*pMIN.items))),2) AS MinMaxratio
FROM
maxmin a
LEFT JOIN
ebmdatalab.hscic.tariff TAR
ON
TAR.bnf_code = a.bnf_code
LEFT JOIN
ebmdatalab.hscic.NEW_prescribing pMAX
ON
pMAX.bnf_code = a.bnf_code
AND pMAX.quantity = a.MaxQ
LEFT JOIN
ebmdatalab.hscic.NEW_prescribing pMIN
ON
pMIN.bnf_code = a.bnf_code
AND pMIN.quantity = a.MinQ
GROUP BY
bnf_code,
bnf_description,
max_quantity,
min_quantity,
DT_category
ORDER BY
MinMaxratio DESC
And here's the link to the Google Sheet
How is NEW_prescribing
defined?
It's September data downloaded from the NHSBSA, which includes the additional granularity of grouped by quantity.
Just need to be careful about the language we use. ePACT was around long before DM+D. I wouldn't describe quantity as packs etc as "wrong", as the dose is in the ePACT descriptor (e.g. 200d). As a user of ePACT, it's far more useful to me to see inhalers etc data in packs, rather than in doses!
See also issue #16
Background
We are interested in comparing the price-per-quantity between different clinically comparable products, so we can advise practices and CCGs on potential cost savings. (In the prescribing dataset, the
quantity
recorded is the one reimbursed to the dispenser, which may be slightly different to that written by the GP).For example. Tramadol Hydrochrolde 300mg tablets are available in several brands, which range greatly in price per pill.
When a GP prescribes generically (e.g. Tramadol Hydrochloride 300mg tablets), the pharmacist can choose which presentation to dispense. This is not necessarily the cheapest. Sometimes it will be whatever the pharmacist has in stock. However, there is an incentive for the pharmacist to dispense whatever they can make most profit on. In extreme cases there are loopholes allowing arbitrarily expensive brands (which have been created specifically for price-gouging) to be dispensed.
This shows the price-per-quantity of the three brands of Tramadol which were dispensed by name in Sept 2016 (click to enlarge):
Note that if we plot all Tramadol 300mg tabs (rather than specific brands), we get a much larger spread with a single massive outlier on the right. This is because of one absurdly-priced brand which (unsurprisingly) has never been explicitly requested by a GP (which means we can't tell which brand it is):
We would normally expect a single product (e.g. Tradorec) to achieve something close to a normal distribution of prices-per-pill (with high kurtosis). If the specific product is in the Tariff, we would expect it hardly to deviate from the Tariff price. Here is Tradorec (one of the three named brands mentioned above) -
N
is small, so it's not a gaussian distribution, but its variance is small and clearly fairly normal.This property allows us to make assertions about possible savings by switching product.
However, we have found at least three types of product listing where the data does not show unimodal distributions of price-per-quantity for a single product.
Type 1: Products available in more than one pack size.
The most common type is products which are available in more than one pack size. Unfortunately these are not coded as different presentations in our data. For example, CoaguChek XS PT Strips have a bimodal distribution of price-per-quantity:
Using the two modes, we can categorise all prescriptions as belonging to the left or right side, and work out their "quantity per item" ("item" corresponds to a single prescription, so can be taken as a marker of the number of prescription events that gave rise to the quantity dispensed in the month in question).
This suggests that the product probably comes in packs of 24 and 48:
A google images search for the product name confirms the hypothesis.
It is possible (using dm+d) to identify pack sizes, but we can't know from the data how many of each size were dispensed except when the total quantity can only be factorized using the pack sizes in one way.
Typically the variation in per-pill price between pack sizes will be negligible, but for items with lots of prescribing it will add up to apparently significant possible cost savings, about which the prescriber can do nothing.
Type 2: Inconsistent ways of recording quantity
When a GP prescribes 20 pills, this could be made up of one 20-pill pack, or half a 40-pill pack, and so on. In any case, the
quantity
should always be recorded as 20. The unit of measure can be anything - it is typically something liketablet
orcapsule
, but can be as esoteric asfilm
orreefer
. The unit of measure is defined in dm+d.However, in some cases the
quantity
is incorrectly recorded as a multiple of pack sizes rather than the correct, smaller unit of measure.As far as we know, this is particularly common for special containers such as inhalers (ex:
0302000K0____AU
), transdermal patches, nasal sprays, cartridges for epipens and similar products (see #12). However, it is usually consistently wrong, i.e. they are consistently prescribed by device or pack, rather than dose (as they should be, according to dm+d). This consistency means we can continue to use such products for price-per-dose comparisons, on the assumption that only one pack size is being used - one that appears to hold for special containers that get their own unique BNF code for pack size.An exception to this "consistently wrong" rule is Glycopyrronium Bronchodilators (
0301020S0____AA
). These are inhaler devices that come with 30 capsules. The official unit of measure is the capsule, so a prescription for a single device should be recorded with a quantity of30
(capsules). However, some GPs and/or prescription software incorrectly records these with a quantity of1
(device) - apparently this is when prescribed by brand (Seebri Breezhaler) rather than generically.The other exceptions we've found so far (all special containers) are tracked in #12.
Such data must be considered unreliable and so should be dropped whenever we find it.
Type 3: Products with genuine internal price variability
What should we do in each case?
Assuming we can automatically detect these cases somehow, what is the correct course of action?
Type 1 (multiple pack sizes)
In this case possible savings would be achieved by using a different pack size. As GPs can't specify pack sizes, this kind of cost saving is not actionable, even if it is meaningful.
Most likely we should just include these, marking things that come in multiple pack sizes and adding a health warning so people can use their own judgement about it.
Type 2 (inconsistent recording of sizes)
In the example of the inhaler above, we know this cannot be dispensed in units other than 30s, so we could correct the data automatically to some extent; though we can never be sure if a quantity which is a multiple of 30 is a single pack, or 30 packs.
The analysis below suggests this isn't a hugely common problem, and where it is, the incorrect values are likely to lie beyond the last centile at which we're measuring possible improvements.
Where we know something is consistently subject to large variation due to this problem, we can explicitly exclude it from the analysis (currently only Glycopyrronium Inhalers).
Type 3 (genuine variability)
We should highlight NP8 medicines, specials and imports clearly on our data to help inform users as to their possible meaning.