ebmdatalab / openpath-dash

Experimental Dash version of openpathology browser
0 stars 1 forks source link

Plymouth QA: spike in Nov 16 and Mar 17 #128

Closed sebbacon closed 4 years ago

sebbacon commented 4 years ago

Similar to #93, which is now fixed...

image

sebbacon commented 4 years ago

The effect is reduced if you use patients as denominator: image

sebbacon commented 4 years ago

Grouping by test, this appears to be accounted for two groups of tests, some (mainly FBC-related?) ending with a peak in Mar 17, others (mainly ALT-related?) with a peak in Nov 16

image

sebbacon commented 4 years ago

Drilling down to a single test (IGG), we can see the same pattern. In almost every case in the left-hand, higher values, the values are even numbers, suggesting double-counting again

image

sebbacon commented 4 years ago

Next step: re-run plymouth data generation without dropping the intermediate files, for further inspection

sebbacon commented 4 years ago

Looking at the raw source data for IGG:

~/Code/openpath-pipeline/data/Plymouth/tmp$ grep L82030 igg.csv | awk -F, '{print $9}'| cut -b-7 | uniq -c
     25 2016-10
     26 2016-11
     21 2016-12
     25 2017-01
     12 2017-02
     27 2017-03

This compares with 24, 27, 12, 12, 6, 9 in our heatmap data.

sebbacon commented 4 years ago

The data being used by my local dash app seems to include repeats again:

$ grep IGG plymouth_processed.csv  | grep L82030 | grep 2016-1
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,2,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,2,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,2,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-12-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-12-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-12-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
sebbacon commented 4 years ago

Looking at the raw data for November 2016 and doing the numbers in my head, I get 15 within range, 1-5 under range, 1-5 over range, and 1-5 non-numeric results - which would give us a total of 24. The numbers in the previous comment would give 27.

sebbacon commented 4 years ago

image

Problem now in Dec 2018 / Jan 2019 - test numbers at least doubling in Plymouth

sebbacon commented 4 years ago

Looking at WBC in Plymouth only, discrepant counts are:

Month Raw count Converted count Count visible in chart
Oct 2018 21281
Nov 2018 19707
Dec 2018 15885 15571 15446
Jan 2019 21279 20983 41494

Yet our local copy goes from 15446 in Dec to 41494 in Jan.

image

The tripling has happened by the time the "plymouth_processed" file has been generated (after flask process_file plymouth data_csvs/_anonymised_plymouth.csv), and before that, by the time _anonymised_plymouth.csv has been generated. And by the time the merged dataframe is created.

So: the reason for the doubling was that I had nulled one of the merged_at fields, so it got appended twice to the running anonymised file :(