Plymouth QA: spike in Nov 16 and Mar 17

sebbacon commented 4 years ago

Similar to #93, which is now fixed...

sebbacon commented 4 years ago

The effect is reduced if you use patients as denominator:

sebbacon commented 4 years ago

Grouping by test, this appears to be accounted for two groups of tests, some (mainly FBC-related?) ending with a peak in Mar 17, others (mainly ALT-related?) with a peak in Nov 16

sebbacon commented 4 years ago

Drilling down to a single test (IGG), we can see the same pattern. In almost every case in the left-hand, higher values, the values are even numbers, suggesting double-counting again

sebbacon commented 4 years ago

Next step: re-run plymouth data generation without dropping the intermediate files, for further inspection

sebbacon commented 4 years ago

Looking at the raw source data for IGG:

~/Code/openpath-pipeline/data/Plymouth/tmp$ grep L82030 igg.csv | awk -F, '{print $9}'| cut -b-7 | uniq -c
     25 2016-10
     26 2016-11
     21 2016-12
     25 2017-01
     12 2017-02
     27 2017-03

This compares with 24, 27, 12, 12, 6, 9 in our heatmap data.

sebbacon commented 4 years ago

The data being used by my local dash app seems to include repeats again:

$ grep IGG plymouth_processed.csv  | grep L82030 | grep 2016-1
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,2,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,2,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,2,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-12-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-12-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-12-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0

sebbacon commented 4 years ago

Looking at the raw data for November 2016 and doing the numbers in my head, I get 15 within range, 1-5 under range, 1-5 over range, and 1-5 non-numeric results - which would give us a total of 24. The numbers in the previous comment would give 27.

sebbacon commented 4 years ago

Problem now in Dec 2018 / Jan 2019 - test numbers at least doubling in Plymouth

sebbacon commented 4 years ago

Looking at WBC in Plymouth only, discrepant counts are:

Month	Raw count	Converted count	Count visible in chart
Oct 2018	21281
Nov 2018	19707
Dec 2018	15885	15571	15446
Jan 2019	21279	20983	41494

Yet our local copy goes from 15446 in Dec to 41494 in Jan.

The tripling has happened by the time the "plymouth_processed" file has been generated (after flask process_file plymouth data_csvs/_anonymised_plymouth.csv), and before that, by the time _anonymised_plymouth.csv has been generated. And by the time the merged dataframe is created.

So: the reason for the doubling was that I had nulled one of the merged_at fields, so it got appended twice to the running anonymised file :(

ebmdatalab / openpath-dash

Plymouth QA: spike in Nov 16 and Mar 17 #128