Open robinyjpark opened 2 years ago
Nice work.
A deeper dive into some of these missing DDDs has shown some curious examples...
Only 2of 14 injectable Vancomycin VMPs (identified by ATC==J01XA01 - whocc page for this code ) has a DDD.
The ones with a DDD figure are
And those without are
In the DDD table, only the former two have a BNF code (5010700). Not sure if this is related.
The omission of a DDD for the latter set feels like an omission from TRUD. Thoughts @brianmackenna @richiecroker @orlamac ?
Further to my previous comment, I've done a slightly deeper dive into this issue along with a nice UpSet plot of the column population rates in the DDD dataset: https://github.com/ebmdatalab/open-nhs-hospital-use-data/blob/ddd_data_quality/notebooks/antibiotics/ddd_quality.ipynb
I started exploring the merged SCMD and DDD data to track the completeness of each field, flag products with missing DDD and counted how many hospitals appeared in the data by month. The notebook can be viewed here.
I used
ProfileReport
frompandas_profiling
to automatically generate summaries of each column to check for completeness. I chose this tool as it seems to allow automatic generation of expectations that can be fed into Great Expectations to test whether the field values are sensible (documentation; more investigation to be done).The notebook also contains a distinct list of products with missing DDD (6,756 products), and a table and interactive plot displaying the number of hospitals per month.