ebmdatalab / price-per-dose

2 stars 1 forks source link

Price-per-dose variation across chemicals #33

Open HelenCEBM opened 7 years ago

HelenCEBM commented 7 years ago

Price-per-dose work thus far picks out drugs where presentations and their generic/branded equivalents, or different (comparable) formulations vary in price. However, there are cases where drugs need to be compared across whole chemicals in order to identify high-cost items, such as Dextromethorphan Hydrobromide Oral Suspension, which has no direct equivalent by the criteria above, but there are similar products of the same chemical which could be substituted. image

Of course not all presentations identified this way will have realistic alternatives available to prescribe but practices may nonetheless wish to look into these excessive costs to consider their options for dealing with them.

SQL Part 1

a) Items will be matched based on chemical and formulation, because it is probably not useful to compare widely different formulations, e.g. tablets and liquids or creams, but we do not want to restrict to identical formulations only. Split the presentation name and group up similar formulations, based upon #8. Also calculate cost per quantity.

WITH 
a0 AS (
SELECT *, 
  SUBSTR(bnf_code,1,9) AS chemical_code,
  CASE  WHEN RTRIM(mid_string) IN ('Vag Crm','Vag Gel','Oily Crm','Crm','Gel','Lot','Gel','Soln','Gel Sach','Oint') THEN 'Creams etc'    --this will only compare identical formulations --  PROBABLY WANT TO INCLUDE EQUIVALENT FORMULATIONS --
        WHEN RTRIM(mid_string) IN ('Tab','Cap','Orodisper Tab','Tab Chble','Tab E/C','Tab G/R','Val Tab','Cap E/C','Tab G/R') THEN 'Tabs Caps etc'
        WHEN RTRIM(mid_string) IN ('Liq Spec','Oral Soln','Sod Oral Soln','Elix','Liq','Oral Susp','Linct','Susp','Soln','Syr','Elix','Mix','Oral Liq') THEN 'Liquids'
        ELSE RTRIM(mid_string) END AS formulation,
        IEEE_DIVIDE(actual_cost, quantity) as cost_per_Quantity
  FROM (
        SELECT pct, practice, bnf_code, items, net_cost, actual_cost, quantity,
            REGEXP_EXTRACT(bnf_name, r'([^_]+)_') AS before_underscore,
            REGEXP_EXTRACT(bnf_name, r'[^_]+_([a-zA-Z \/-]+).*$') AS mid_string,
            REGEXP_EXTRACT(bnf_name, r'[^_]+_.*?([a-zA-Z \/-]+)?$') AS remainder     
            FROM ebmdatalab.hscic.prescribing_temp 
            WHERE SUBSTR(bnf_code,1,2) < '20'
    )
    ),

b) Calculate variations in price per dose for each chemical-formulation (across all practices).

SELECT
  b.chemical,
  p.formulation,
  SUBSTR(bnf_code,1,9) AS chemical_code,
  COUNT(*) AS count,
  sum(quantity) AS quantity,
  IEEE_DIVIDE(stddev_pop(IEEE_DIVIDE(actual_cost, quantity)), avg(IEEE_DIVIDE(actual_cost,quantity))) as coefficient_of_deviation,    
  MAX(IEEE_DIVIDE(actual_cost, quantity)) - MIN(IEEE_DIVIDE(actual_cost, quantity)) AS delta,
  MAX(IEEE_DIVIDE(actual_cost, quantity)) AS Max_cost,
  MIN(IEEE_DIVIDE(actual_cost, quantity)) AS Min_cost

FROM a0 AS p
  LEFT JOIN (SELECT DISTINCT chemical, chemical_code FROM ebmdatalab.hscic.bnf) b ON SUBSTR(p.bnf_code,1,9) = b.chemical_code
  WHERE
    SUBSTR(bnf_code,1,2) < '18'
    AND quantity >0
GROUP BY  chemical,  chemical_code,  formulation
ORDER BY
  chemical DESC 
  ),

c) Practice-level details for each chemical

p AS   (
  SELECT chemical_code, formulation, practice, sum(actual_cost) AS actual_cost, sum(quantity) as quantity, cost_per_quantity 
    FROM  a0
    GROUP BY chemical_code, formulation, cost_per_quantity, practice
   )

d) List all chemical-presentation details, count how many practices are prescribing towards the high end of the price range for each (then limit this to <10 practices, as we are aiming to find unusual behaviours/costs), along with what quantity of items this relates to and how much it costs. Total cost per chemical/formulation is limited to <£50 to exclude items that are very low priced.

SELECT  
  a.chemical,
  a.chemical_code,
  a.formulation,
  a.count,
  a.coefficient_of_deviation,
  a.delta,
  a.min_cost,
  a.max_cost,
  IEEE_DIVIDE(a.max_cost,a.min_cost) AS max_min_ratio,
  sum(p.quantity) AS Quant_at_max_cost,
  sum(p.actual_cost) AS Spend_at_max_cost,
  sum(p.quantity)*a.min_cost AS lowest_poss_cost,
  count(distinct p.practice) AS pracs_prescribing_at_max_cost
 FROM a
 LEFT JOIN  p 
       ON a.chemical_code = p.chemical_code AND COALESCE(a.formulation,'OTHER') = COALESCE(p.formulation,'OTHER') AND p.cost_per_Quantity >= a.Max_cost-(delta*0.1)

  WHERE IEEE_DIVIDE(a.delta,a.min_cost) >10

  GROUP BY 
    chemical,
    chemical_code,
    formulation,
    count,
    coefficient_of_deviation,
    delta,
    min_cost,
    max_cost

  HAVING pracs_prescribing_at_max_cost < 10
  AND Spend_at_max_cost > 50

  ORDER BY max_min_ratio desc

Save results as table

SQL Part 2

Part 1 has identified chemicals of interest and summarised the variation in price per dose. Next we extract details for each chemical-formulation to show exactly which presentations are high-priced and which practices are involved.

a) Recreate formulation groups as above (Part 1a) create temp table a0).

b) Find details of lowest cost items. Join to practice details (use inner join to select only practices of type 4).

SELECT b.chemical, b.chemical_code, b.formulation, coefficient_of_deviation, delta, min_cost, max_cost, 
    max_min_ratio, 
    Quant_at_max_cost, 
    Spend_at_max_cost, 
    lowest_poss_cost,
    pracs_prescribing_at_max_cost, 
    a0.bnf_code, a0.bnf_name, a0.pct, a0.practice, c.name AS practice_name, a0.cost_per_Quantity, 
    a0.actual_cost, a0.quantity, a0.items, 
    a0.quantity*min_cost AS lowest_poss_cost_prac
FROM a0
    INNER JOIN ebmdatalab.tmp_eu.high_ppq_chemical_level b ON a0.chemical_code = b.chemical_code 
       AND a0.formulation = b.formulation AND a0.cost_per_Quantity >= b.Max_cost-(delta*0.1)
    LEFT JOIN  ebmdatalab.hscic.practices C ON C.code = a0.practice 
ORDER BY Spend_at_max_cost DESC, cost_per_quantity DESC

SQL Part 3

We also need to find details of the lowest cost items for each chemical so that these alternative products can be shown as potential alternatives to the high priced ones.

a) Recreate formulation groups as above (temp table a0).

b) Find details of lowest cost items.

SELECT b.chemical, b.chemical_code, b.formulation, coefficient_of_deviation, delta, min_cost, 
    a0.bnf_code, a0.bnf_name,  avg(a0.cost_per_Quantity) AS avg_min_cost_per_Quantity, sum(a0.quantity) AS quantity_near_min_cost, sum(a0.items) as items_near_min_cost
    FROM a0
    inner join ebmdatalab.tmp_eu.high_ppq_chemical_level b on a0.chemical_code = b.chemical_code and a0.formulation = b.formulation AND a0.cost_per_Quantity <= b.Min_cost +(delta*0.1)
    --LEFT JOIN  ebmdatalab.hscic.practices C ON C.code = a0.practice 
    GROUP BY chemical, chemical_code, formulation, coefficient_of_deviation, delta, min_cost, bnf_code, bnf_name
    ORDER BY chemical