sebbacon commented 7 years ago

Background

We are interested in comparing the price-per-quantity between different clinically comparable products, so we can advise practices and CCGs on potential cost savings. (In the prescribing dataset, the quantity recorded is the one reimbursed to the dispenser, which may be slightly different to that written by the GP).

For example. Tramadol Hydrochrolde 300mg tablets are available in several brands, which range greatly in price per pill.

When a GP prescribes generically (e.g. Tramadol Hydrochloride 300mg tablets), the pharmacist can choose which presentation to dispense. This is not necessarily the cheapest. Sometimes it will be whatever the pharmacist has in stock. However, there is an incentive for the pharmacist to dispense whatever they can make most profit on. In extreme cases there are loopholes allowing arbitrarily expensive brands (which have been created specifically for price-gouging) to be dispensed.

This shows the price-per-quantity of the three brands of Tramadol which were dispensed by name in Sept 2016 (click to enlarge):

Note that if we plot all Tramadol 300mg tabs (rather than specific brands), we get a much larger spread with a single massive outlier on the right. This is because of one absurdly-priced brand which (unsurprisingly) has never been explicitly requested by a GP (which means we can't tell which brand it is):

We would normally expect a single product (e.g. Tradorec) to achieve something close to a normal distribution of prices-per-pill (with high kurtosis). If the specific product is in the Tariff, we would expect it hardly to deviate from the Tariff price. Here is Tradorec (one of the three named brands mentioned above) - N is small, so it's not a gaussian distribution, but its variance is small and clearly fairly normal.

This property allows us to make assertions about possible savings by switching product.

However, we have found at least three types of product listing where the data does not show unimodal distributions of price-per-quantity for a single product.

Type 1: Products available in more than one pack size.

The most common type is products which are available in more than one pack size. Unfortunately these are not coded as different presentations in our data. For example, CoaguChek XS PT Strips have a bimodal distribution of price-per-quantity:

Using the two modes, we can categorise all prescriptions as belonging to the left or right side, and work out their "quantity per item" ("item" corresponds to a single prescription, so can be taken as a marker of the number of prescription events that gave rise to the quantity dispensed in the month in question).

SELECT
  SUM(items) AS items,
  SUM(quantity) AS quantity,
  SUM(quantity)/SUM(items) AS qpi,
  IF(net_cost/quantity < 2.85, 'low', 'high') AS category
FROM
  ebmdatalab.hscic.prescribing
WHERE
  bnf_code = '0208020W0BBABA0'
  AND month = TIMESTAMP('2016-09-01')
GROUP BY
  category

This suggests that the product probably comes in packs of 24 and 48:

A google images search for the product name confirms the hypothesis.

It is possible (using dm+d) to identify pack sizes, but we can't know from the data how many of each size were dispensed except when the total quantity can only be factorized using the pack sizes in one way.

Typically the variation in per-pill price between pack sizes will be negligible, but for items with lots of prescribing it will add up to apparently significant possible cost savings, about which the prescriber can do nothing.

Type 2: Inconsistent ways of recording quantity

When a GP prescribes 20 pills, this could be made up of one 20-pill pack, or half a 40-pill pack, and so on. In any case, the quantity should always be recorded as 20. The unit of measure can be anything - it is typically something like tablet or capsule, but can be as esoteric as film or reefer. The unit of measure is defined in dm+d.

However, in some cases the quantity is incorrectly recorded as a multiple of pack sizes rather than the correct, smaller unit of measure.

As far as we know, this is particularly common for special containers such as inhalers (ex: 0302000K0____AU), transdermal patches, nasal sprays, cartridges for epipens and similar products (see #12). However, it is usually consistently wrong, i.e. they are consistently prescribed by device or pack, rather than dose (as they should be, according to dm+d). This consistency means we can continue to use such products for price-per-dose comparisons, on the assumption that only one pack size is being used - one that appears to hold for special containers that get their own unique BNF code for pack size.

An exception to this "consistently wrong" rule is Glycopyrronium Bronchodilators (0301020S0____AA). These are inhaler devices that come with 30 capsules. The official unit of measure is the capsule, so a prescription for a single device should be recorded with a quantity of 30 (capsules). However, some GPs and/or prescription software incorrectly records these with a quantity of 1 (device) - apparently this is when prescribed by brand (Seebri Breezhaler) rather than generically.

The other exceptions we've found so far (all special containers) are tracked in #12.

Such data must be considered unreliable and so should be dropped whenever we find it.

Type 3: Products with genuine internal price variability

Things outside the Tariff (NP8) can theoretically be invoiced at any price. Common examples include:
- Unlicensed products with availability restrictions. There are drugs which are either imported or are "specials", the prices of which should be set in VIIIB of the Tarifff. Where they are not set (NP8) the prices may vary wildly due to different import routes, etc.
- Licensed products which are not in the Tariff. It is very hard to distinguish between savings from different pack sizes, or other true variance, in these cases. The top examples are:
- Imuvac flu vaccine (1404000H0BNAAAF, £1.6 million spending in Sept 2016). Comes in packs of 1 or 10.
- Cutaneous creams: a handful of these have a high spending (Aveeno, E45, Sudocrem and Balneum Plus - about £1.7m per month). It's hard to tell if their variation is due to different pack sizes, or something else.
Products in the tariff where broken bulk applies. This allows dispensers to round up the amount charged to the nearest pack size. We've not analysed this but it's unlikely to represent a significant amount of money.
NP8 products subject to price gouging. Some unscrupulous companies take advantage of the fact NP8 medicine prices are unregulated to create artificially expensive products, the profit from which they share with dispensers in return for them being chosen. All examples of this that we've seen do not have high variance within a given brand, but it is worth considering that the company may choose the make such a product available at several price points depending on the arrangement they have with the dispenser.

What should we do in each case?

Assuming we can automatically detect these cases somehow, what is the correct course of action?

Type 1 (multiple pack sizes)

In this case possible savings would be achieved by using a different pack size. As GPs can't specify pack sizes, this kind of cost saving is not actionable, even if it is meaningful.

Most likely we should just include these, marking things that come in multiple pack sizes and adding a health warning so people can use their own judgement about it.

Type 2 (inconsistent recording of sizes)

In the example of the inhaler above, we know this cannot be dispensed in units other than 30s, so we could correct the data automatically to some extent; though we can never be sure if a quantity which is a multiple of 30 is a single pack, or 30 packs.

The analysis below suggests this isn't a hugely common problem, and where it is, the incorrect values are likely to lie beyond the last centile at which we're measuring possible improvements.

Where we know something is consistently subject to large variation due to this problem, we can explicitly exclude it from the analysis (currently only Glycopyrronium Inhalers).

Type 3 (genuine variability)

We should highlight NP8 medicines, specials and imports clearly on our data to help inform users as to their possible meaning.

HelenCEBM commented 7 years ago

A potential method for testing for multimodality is using the Mclust function in R which tries to fit one or more normal distributions to the data, and should tell us how many "clusters" we have. It is described here: http://stats.stackexchange.com/questions/138223/how-to-test-if-my-distribution-is-multimodal

As another option is to try the Shapiro-Wilks test for normality (i.e. test for single peak)... https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test It should give us a p-value, where low indicates not-normal - would have to see what level of significance suits our data. Seems to be popular, provided n < 5000. Here is what appears to be instructions for doing this in R (though I'm not familiar with R!) http://stat.ethz.ch/R-manual/R-devel/library/stats/html/shapiro.test.html

lmfrench commented 7 years ago

Type 1: To map AMPP: Use the BNF - dm+d map provided by the BSA, with the dm+d AMPP information available from here: https://isd.hscic.gov.uk/trud3/user/guest/group/0/pack/6/subpack/24/releases

richiecroker commented 7 years ago

(comment combined with ones below)

richiecroker commented 7 years ago

Here's the link to the Google Sheet for respiratory system drugs.

richiecroker commented 7 years ago

Here is some SQL (created with @HelenCEBM - thanks Helen) to identify where there are outliers in the type of situation we have found ourselves in with the glycopyrronium. It looks at the ratio between the NIC/(quantity*items) for both MIN(quantity) and MAX(quantity). For something like glycopyrronium, this ratio should be 30, due to the difference between calculating per "inhaler" and per dose. This works in this SQL.

It also identifies whether it is a branded preparation, and if not, whether it is Cat A, C or M in the Drug Tariff, or whether it is NP8.

WITH
  maxmin AS (
  SELECT
    bnf_code,
    MAX(Quantity) AS maxQ,
    MIN(Quantity) AS minQ
  FROM
    ebmdatalab.hscic.NEW_prescribing
  WHERE
    bnf_code NOT LIKE '0905%'
    AND bnf_code NOT LIKE '0904%'
    AND bnf_code NOT LIKE '0910%'
    AND bnf_code NOT LIKE '19%'
    AND bnf_code NOT LIKE '18%'
    AND bnf_code NOT LIKE '1315%'
    AND bnf_code NOT LIKE '120101020%'
    AND bnf_code NOT LIKE '20%'
    AND bnf_code NOT LIKE '21%'
    AND bnf_code NOT LIKE '22%'
    AND bnf_code NOT LIKE '23%'
  GROUP BY
    bnf_code)
SELECT
  a.bnf_code,
  pMAX.bnf_description,
  IFNULL(TAR.category,
    (IF(SUBSTR(a.bnf_code,10,2)='AA',
        'NP8',
        "Brand"))) AS DT_category,
  a.maxQ AS max_quantity,
  SUM(pMAX.items) AS max_items,
  SUM(pMAX.quantity*pMAX.items) AS max_QI,
  SUM (pMAX.NIC) AS max_NIC,
  IEEE_DIVIDE(SUM(pMAX.NIC),SUM(pMAX.quantity*pMAX.items)) AS max_NICperQI,
  a.minQ AS min_quantity,
  SUM(pMIN.items) AS min_items,
  SUM(pMIN.quantity*pMIN.items) AS min_QI,
  SUM (pMIN.NIC) AS min_NIC,
  IEEE_DIVIDE(SUM(pMIN.NIC),SUM(pMIN.quantity*pMIN.items)) AS min_NICperQI,
  ROUND(IEEE_DIVIDE( IEEE_DIVIDE(SUM(pMAX.NIC),SUM(pMAX.quantity*pMAX.items)), IEEE_DIVIDE(SUM(pMIN.NIC),SUM(pMIN.quantity*pMIN.items))),2) AS MinMaxratio
FROM
  maxmin a
LEFT JOIN
  ebmdatalab.hscic.tariff TAR
ON
  TAR.bnf_code = a.bnf_code
LEFT JOIN
  ebmdatalab.hscic.NEW_prescribing pMAX
ON
  pMAX.bnf_code = a.bnf_code
  AND pMAX.quantity = a.MaxQ
LEFT JOIN
  ebmdatalab.hscic.NEW_prescribing pMIN
ON
  pMIN.bnf_code = a.bnf_code
  AND pMIN.quantity = a.MinQ
GROUP BY
  bnf_code,
  bnf_description,
  max_quantity,
  min_quantity,
  DT_category
ORDER BY
  MinMaxratio DESC

And here's the link to the Google Sheet

sebbacon commented 7 years ago

How is NEW_prescribing defined?

richiecroker commented 7 years ago

It's September data downloaded from the NHSBSA, which includes the additional granularity of grouped by quantity.

richiecroker commented 7 years ago

Just need to be careful about the language we use. ePACT was around long before DM+D. I wouldn't describe quantity as packs etc as "wrong", as the dose is in the ePACT descriptor (e.g. 200d). As a user of ePACT, it's far more useful to me to see inhalers etc data in packs, rather than in doses!

sebbacon commented 7 years ago

ebmdatalab / price-per-dose

Handle products with highly variable quantity-per-item within one brand #2

Background

Type 1: Products available in more than one pack size.

Type 2: Inconsistent ways of recording quantity

Type 3: Products with genuine internal price variability

What should we do in each case?

Type 1 (multiple pack sizes)

Type 2 (inconsistent recording of sizes)

Type 3 (genuine variability)