ebmdatalab / price-per-dose

2 stars 1 forks source link

Document the process of defining allowable formulation swaps #11

Closed sebbacon closed 7 years ago

sebbacon commented 7 years ago

Issue #8 introduces the idea of equivalent formulation swaps. It resulted in the following spreadsheet which is consulted in the price-per-quantity calculations.

We should document how the spreadsheet can be reproduced so it can be updated in the future.

https://docs.google.com/spreadsheets/d/1usBWtho-Cm_coZkUfSwJ1RJSynkc17EoUN4SZl8czKs/edit#gid=0


Updated by @evansd on 2020-04-27 The original, now unused, spreadsheet link is below: https://docs.google.com/spreadsheets/d/1SvMGCKrmqsNkZYuGW18Sf0wTluXyV4bhyZQaVLcO41c/edit#gid=1799396915)

sebbacon commented 7 years ago

The following updated SQL includes all the data from the spreadsheet.

To create updated formulation swaps, import the results into a google sheet, and follow the guidelines described by Rich to add a Y/N column "Really equivalent?" column. Then update the code for generating price-per-pill to query that column in that spreadsheet.

Rich's guidelines:

WITH
  presentations AS (
  SELECT
    presentation_code,
    presentation,
    REGEXP_EXTRACT(presentation, r'([^_]+)_') AS before_underscore,
    REGEXP_EXTRACT(presentation, r'[^_]+_(.*?)(?:[0-9]+.*)?$') AS after_underscore,
    REGEXP_EXTRACT(presentation, r'[^_]+_.*?([0-9]+.*)?$') AS remainder
  FROM
    ebmdatalab.hscic.bnf
  WHERE
    REGEXP_EXTRACT(presentation, r'([^_]+)_') NOT LIKE "%Gppe%"
    AND SUBSTR(presentation_code,10,2) = 'AA'
    AND SUBSTR(presentation_code,1,2) < '20' )
SELECT
  alts.*,
  SUM(sept.quantity) AS Sept_Quantity,
  SUM(sept.net_cost) AS Sept_NetCost,
  SUM(sept_alt.net_cost) AS Sept_AltNetCost
FROM (
  SELECT
    a.presentation_code AS code,
    a.presentation,
    a.after_underscore AS formulation,
    b.presentation_code AS alt_code,
    b.presentation AS alt_presentation,
    b.after_underscore AS alt_formulation
  FROM
    presentations a
  INNER JOIN
    presentations b
  ON
    a.before_underscore = b.before_underscore
    AND a.remainder= b.remainder
    AND a.after_underscore <> b.after_underscore
  ORDER BY
    code ) alts
INNER JOIN
  ebmdatalab.tmp_eu.prescribing_sept sept
ON
  alts.code = sept.bnf_code
INNER JOIN
  ebmdatalab.tmp_eu.prescribing_sept sept_alt
ON
  alts.alt_code = sept_alt.bnf_code
GROUP BY
  code,
  presentation,
  formulation,
  alt_code,
  alt_presentation,
  alt_formulation
ORDER BY
  code
  1. Make a spreadsheet using the SQL outlined in #8
  2. Add quantity of alternate formulations
  3. Remove codes with no active prescribing
  4. Manually identify those which we consider real alternatives
sebbacon commented 7 years ago

Note that the current spreadsheet also: