Closed sebbacon closed 4 years ago
The effect is reduced if you use patients as denominator:
Grouping by test, this appears to be accounted for two groups of tests, some (mainly FBC-related?) ending with a peak in Mar 17, others (mainly ALT-related?) with a peak in Nov 16
Drilling down to a single test (IGG
), we can see the same pattern. In almost every case in the left-hand, higher values, the values are even numbers, suggesting double-counting again
Next step: re-run plymouth data generation without dropping the intermediate files, for further inspection
Looking at the raw source data for IGG:
~/Code/openpath-pipeline/data/Plymouth/tmp$ grep L82030 igg.csv | awk -F, '{print $9}'| cut -b-7 | uniq -c
25 2016-10
26 2016-11
21 2016-12
25 2017-01
12 2017-02
27 2017-03
This compares with 24, 27, 12, 12, 6, 9 in our heatmap data.
The data being used by my local dash app seems to include repeats again:
$ grep IGG plymouth_processed.csv | grep L82030 | grep 2016-1
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,2,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-10-01,L82030,LAUNCESTON MEDICAL CENTRE,2,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,2,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-11-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
11N,3,2.0,plymouth,2016-12-01,L82030,LAUNCESTON MEDICAL CENTRE,-1,IGG,17668.0
11N,3,2.0,plymouth,2016-12-01,L82030,LAUNCESTON MEDICAL CENTRE,0,IGG,17668.0
11N,3,2.0,plymouth,2016-12-01,L82030,LAUNCESTON MEDICAL CENTRE,1,IGG,17668.0
Looking at the raw data for November 2016 and doing the numbers in my head, I get 15 within range, 1-5 under range, 1-5 over range, and 1-5 non-numeric results - which would give us a total of 24. The numbers in the previous comment would give 27.
Problem now in Dec 2018 / Jan 2019 - test numbers at least doubling in Plymouth
Looking at WBC in Plymouth only, discrepant counts are:
Month | Raw count | Converted count | Count visible in chart |
---|---|---|---|
Oct 2018 | 21281 | ||
Nov 2018 | 19707 | ||
Dec 2018 | 15885 | 15571 | 15446 |
Jan 2019 | 21279 | 20983 | 41494 |
Yet our local copy goes from 15446 in Dec to 41494 in Jan.
The tripling has happened by the time the "plymouth_processed" file has been generated (after flask process_file plymouth data_csvs/_anonymised_plymouth.csv
), and before that, by the time _anonymised_plymouth.csv
has been generated. And by the time the merged
dataframe is created.
So: the reason for the doubling was that I had nulled one of the merged_at fields, so it got appended twice to the running anonymised file :(
Similar to #93, which is now fixed...