Closed mqli322 closed 6 years ago
@mqli322 @AmandaEyer qc_outliers.sql is the logic for how outliers are currently flagged - it's flagging the top 10 records with the greatest units_net count, but it seems that 10 is too small and not capturing these other jobs. @AmandaEyer Should I up the count to 20? There are 18 records where units_net = 99
or change the logic so that it flags all jobs where the units_net is greater than a certain value
both jobs had 9999 in u_init and u_prop. do we care about flagging outliers where the underlying data is clearly wrong, even though there's no impact on u_net?
I think that if we know that a record is wrong, we should flag it as an outlier or correct it, whether or not there's an impact?
Yes, I think top 20 is good for outlier checking. Those require manual checks then, correct?
Okay so right now in qc_outliers we're only pulling the top 20 and bottom 20 units_net
So should we also include the top 20 and bottom 20 from u_init and u_prop?
No. I checked and I don't think that add value. Perhaps just mark as outlier any record where u_init or u_prop is 9999?
I rather not because that is so specific. How about we just leave qc_outliers.sql as is Where an outlier is -- 1) top 20 and bottom 20 for units_net -- 2) units_net_complete is 50+ units greater than units proposed (NBs only)
Okay.
So, what do we do with the 9999 records. Just "correct" them in the corrections sheet, maybe?
for 1) that is the logic for checking IF they are valid outliers, not actually marking them as outliers, correct? Because out of the 20 that Jackie and Hannah identified as potential outliers, only 4 records turned out to actually be outliers. They still did a manual check after the first outlier flag.
Re 9999 I can include the top 20 and bottom 20 from u_init and u_prop Or I can do nothing and you can find these outliers on your own and add them to the dcp_attributes.csv
For outliers, yes these are outputted to qc_outliers.csv for your review
Okay, please include only the TOP 20 from u_init and u_prop
okay
Sorry I changed my opinion! wrote too quickly.
... okay so what would you like?
the TOP 20 from u_init and u_prop
okay
thanks!
Added logic in qc_outliers to take top 20 and bottom 20 from units_init and units_prop
Ah didn't get that. Okay only flagging top 20 from u_init and u_prop, but not the bottom 20.
Where units = 9999