Closed PriyaDoIT closed 7 years ago
Thanks. 104 days of false positives is quite high. That is, it is wrong nearly as much as it is right. There is a lot of political cost with spraying, so we need to think about ways to minimize this. Perhaps even at the cost of adding more false negatives.
I reran the model based on the business rule of "no sprays after labor day".
I'm calling "sprays" dates in which a single trap came back positive two weeks in a row. I simply deleted all the rows after labor day before I put the data into the model then I retrained the model.
After taking this approach the AUC is 96% for 2016, in which there just over 108 sprays, but there are still several false positives. Here's how the season would have unfolded with a cutoff of 20%:
date N FALSE_POS TRUE_POS TOT_POS
2016-06-10 27 0 0 0
2016-06-17 62 1 0 1
2016-06-24 61 0 0 0
2016-07-01 54 0 0 0
2016-07-08 47 0 0 0
2016-07-15 55 3 2 5
2016-07-22 52 2 1 3
2016-07-29 66 2 4 6
2016-08-05 70 6 26 32
2016-08-12 67 9 29 38
2016-08-19 72 18 28 46
2016-08-26 61 24 11 35
2016-09-02 64 11 6 17
TOTAL 758 76 107 183
The full confusion matrix results:
r true_pos true_neg false_neg false_pos sensitivity specificity recall precision fmeasure
0.000 108 0 0 650 1.000000000 0.0000000 1.000000000 0.1424802 0.24942263
0.025 108 568 0 82 1.000000000 0.8738462 1.000000000 0.5684211 0.72483221
0.050 108 568 0 82 1.000000000 0.8738462 1.000000000 0.5684211 0.72483221
0.075 108 568 0 82 1.000000000 0.8738462 1.000000000 0.5684211 0.72483221
0.100 108 570 0 80 1.000000000 0.8769231 1.000000000 0.5744681 0.72972973
0.125 108 570 0 80 1.000000000 0.8769231 1.000000000 0.5744681 0.72972973
0.150 108 572 0 78 1.000000000 0.8800000 1.000000000 0.5806452 0.73469388
0.175 107 573 1 77 0.990740741 0.8815385 0.990740741 0.5815217 0.73287671
0.200 107 574 1 76 0.990740741 0.8830769 0.990740741 0.5846995 0.73539519
0.225 107 579 1 71 0.990740741 0.8907692 0.990740741 0.6011236 0.74825175
0.250 106 582 2 68 0.981481481 0.8953846 0.981481481 0.6091954 0.75177305
0.275 105 586 3 64 0.972222222 0.9015385 0.972222222 0.6213018 0.75812274
0.300 101 590 7 60 0.935185185 0.9076923 0.935185185 0.6273292 0.75092937
0.325 99 592 9 58 0.916666667 0.9107692 0.916666667 0.6305732 0.74716981
0.350 97 599 11 51 0.898148148 0.9215385 0.898148148 0.6554054 0.75781250
0.375 92 607 16 43 0.851851852 0.9338462 0.851851852 0.6814815 0.75720165
0.400 86 611 22 39 0.796296296 0.9400000 0.796296296 0.6880000 0.73819742
0.425 80 614 28 36 0.740740741 0.9446154 0.740740741 0.6896552 0.71428571
0.450 74 619 34 31 0.685185185 0.9523077 0.685185185 0.7047619 0.69483568
0.475 63 621 45 29 0.583333333 0.9553846 0.583333333 0.6847826 0.63000000
0.500 57 624 51 26 0.527777778 0.9600000 0.527777778 0.6867470 0.59685864
0.525 50 628 58 22 0.462962963 0.9661538 0.462962963 0.6944444 0.55555556
0.550 42 631 66 19 0.388888889 0.9707692 0.388888889 0.6885246 0.49704142
0.575 40 633 68 17 0.370370370 0.9738462 0.370370370 0.7017544 0.48484848
0.600 37 634 71 16 0.342592593 0.9753846 0.342592593 0.6981132 0.45962733
0.625 31 636 77 14 0.287037037 0.9784615 0.287037037 0.6888889 0.40522876
0.650 27 638 81 12 0.250000000 0.9815385 0.250000000 0.6923077 0.36734694
0.675 22 640 86 10 0.203703704 0.9846154 0.203703704 0.6875000 0.31428571
0.700 12 643 96 7 0.111111111 0.9892308 0.111111111 0.6315789 0.18897638
0.725 7 646 101 4 0.064814815 0.9938462 0.064814815 0.6363636 0.11764706
0.750 6 648 102 2 0.055555556 0.9969231 0.055555556 0.7500000 0.10344828
0.775 1 650 107 0 0.009259259 1.0000000 0.009259259 1.0000000 0.01834862
0.800 0 650 108 0 0.000000000 1.0000000 0.000000000 NaN NaN
0.825 0 650 108 0 0.000000000 1.0000000 0.000000000 NaN NaN
0.850 0 650 108 0 0.000000000 1.0000000 0.000000000 NaN NaN
0.875 0 650 108 0 0.000000000 1.0000000 0.000000000 NaN NaN
0.900 0 650 108 0 0.000000000 1.0000000 0.000000000 NaN NaN
0.925 0 650 108 0 0.000000000 1.0000000 0.000000000 NaN NaN
0.950 0 650 108 0 0.000000000 1.0000000 0.000000000 NaN NaN
0.975 0 650 108 0 0.000000000 1.0000000 0.000000000 NaN NaN
1.000 0 650 108 0 0.000000000 1.0000000 0.000000000 NaN NaN
@tomschenkjr I"m going to edit the previous comment to match (i.e. include totals and full confusion matrix).
@tomschenkjr Actually I'm not going to edit the original because it changed slightly (probably because I changed the feature calculations). So, I'm posting the previous result below. This is the same as above but with sprays after labor day:
date N FALSE_POS TRUE_POS TOT_POS
2016-06-10 27 0 0 0
2016-06-17 62 1 0 1
2016-06-24 61 0 0 0
2016-07-01 54 0 0 0
2016-07-08 47 0 0 0
2016-07-15 55 3 2 5
2016-07-22 52 2 1 3
2016-07-29 66 2 4 6
2016-08-05 70 6 26 32
2016-08-12 67 9 29 38
2016-08-19 72 18 28 46
2016-08-26 61 24 11 35
2016-09-02 64 13 6 19
2016-09-09 68 10 7 17
2016-09-16 63 12 5 17
2016-09-23 66 6 0 6
2016-09-30 58 0 0 0
TOTAL 1013 106 119 225
Full confusion matrix:
r true_pos true_neg false_neg false_pos sensitivity specificity recall precision fmeasure
0.000 120 0 0 893 1.00000000 0.0000000 1.00000000 0.1184600 0.2118270
0.025 120 770 0 123 1.00000000 0.8622620 1.00000000 0.4938272 0.6611570
0.050 120 770 0 123 1.00000000 0.8622620 1.00000000 0.4938272 0.6611570
0.075 120 773 0 120 1.00000000 0.8656215 1.00000000 0.5000000 0.6666667
0.100 120 775 0 118 1.00000000 0.8678611 1.00000000 0.5042017 0.6703911
0.125 120 775 0 118 1.00000000 0.8678611 1.00000000 0.5042017 0.6703911
0.150 120 780 0 113 1.00000000 0.8734602 1.00000000 0.5150215 0.6798867
0.175 120 785 0 108 1.00000000 0.8790594 1.00000000 0.5263158 0.6896552
0.200 119 787 1 106 0.99166667 0.8812990 0.99166667 0.5288889 0.6898551
0.225 118 797 2 96 0.98333333 0.8924972 0.98333333 0.5514019 0.7065868
0.250 117 802 3 91 0.97500000 0.8980963 0.97500000 0.5625000 0.7134146
0.275 117 808 3 85 0.97500000 0.9048152 0.97500000 0.5792079 0.7267081
0.300 111 812 9 81 0.92500000 0.9092945 0.92500000 0.5781250 0.7115385
0.325 109 819 11 74 0.90833333 0.9171333 0.90833333 0.5956284 0.7194719
0.350 104 828 16 65 0.86666667 0.9272116 0.86666667 0.6153846 0.7197232
0.375 97 833 23 60 0.80833333 0.9328108 0.80833333 0.6178344 0.7003610
0.400 93 844 27 49 0.77500000 0.9451288 0.77500000 0.6549296 0.7099237
0.425 85 849 35 44 0.70833333 0.9507279 0.70833333 0.6589147 0.6827309
0.450 77 854 43 39 0.64166667 0.9563270 0.64166667 0.6637931 0.6525424
0.475 70 857 50 36 0.58333333 0.9596865 0.58333333 0.6603774 0.6194690
0.500 55 866 65 27 0.45833333 0.9697648 0.45833333 0.6707317 0.5445545
0.525 50 869 70 24 0.41666667 0.9731243 0.41666667 0.6756757 0.5154639
0.550 44 872 76 21 0.36666667 0.9764838 0.36666667 0.6769231 0.4756757
0.575 37 875 83 18 0.30833333 0.9798432 0.30833333 0.6727273 0.4228571
0.600 34 878 86 15 0.28333333 0.9832027 0.28333333 0.6938776 0.4023669
0.625 30 879 90 14 0.25000000 0.9843225 0.25000000 0.6818182 0.3658537
0.650 25 881 95 12 0.20833333 0.9865622 0.20833333 0.6756757 0.3184713
0.675 18 886 102 7 0.15000000 0.9921613 0.15000000 0.7200000 0.2482759
0.700 12 889 108 4 0.10000000 0.9955207 0.10000000 0.7500000 0.1764706
0.725 7 891 113 2 0.05833333 0.9977604 0.05833333 0.7777778 0.1085271
0.750 3 892 117 1 0.02500000 0.9988802 0.02500000 0.7500000 0.0483871
0.775 0 893 120 0 0.00000000 1.0000000 0.00000000 NaN NaN
0.800 0 893 120 0 0.00000000 1.0000000 0.00000000 NaN NaN
0.825 0 893 120 0 0.00000000 1.0000000 0.00000000 NaN NaN
0.850 0 893 120 0 0.00000000 1.0000000 0.00000000 NaN NaN
0.875 0 893 120 0 0.00000000 1.0000000 0.00000000 NaN NaN
0.900 0 893 120 0 0.00000000 1.0000000 0.00000000 NaN NaN
0.925 0 893 120 0 0.00000000 1.0000000 0.00000000 NaN NaN
0.950 0 893 120 0 0.00000000 1.0000000 0.00000000 NaN NaN
0.975 0 893 120 0 0.00000000 1.0000000 0.00000000 NaN NaN
1.000 0 893 120 0 0.00000000 1.0000000 0.00000000 NaN NaN
So in our email thread you suggested using a higher cutoff value; e.g. 50%. I think this is a good idea.
Also, I'm sorry that I didn't include the full confusion matrix results originally, I obviously calculated them, but I thought it was too verbose. Turns out that we need the detail. Also (by the way) I thought you meant "confusion matrix for one choice of cutoff", but really I think we want to see a variety of cutoffs. Can you confirm?
@geneorama - yeah, the full variety of cutoffs is useful here.
I think the principal question is asking the client what their threshold of pain is for false positives. That is, how many are too many?
Are you working on any other strategies for model performance at the moment?
@tomschenkjr not working on other model performance strategies right now. I was looking at the variables and my biggest concern right now is that the performance looks too good to be true.
@geneorama - is this done yet?
I'm happy to say that "the model is right about 90% of the time for predicting the presence of WNV 1 week in advance" or "this model gives us about a 1 week lead time on WNV spray decisions".
Come by my office now.
From: Gene Leynes [mailto:notifications@github.com] Sent: Thursday, June 22, 2017 11:28 AM To: Chicago/WNV_model WNV_model@noreply.github.com Cc: Schenk, Tom Tom.Schenk@cityofchicago.org; Mention mention@noreply.github.com Subject: Re: [Chicago/WNV_model] Human Readable Number (#17)
I'm happy to say that "the model is right about 90% of the time for predicting the presence of WNV 1 week in advance" or "this model gives us about a 1 week lead time on WNV spray decisions".
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Chicago/WNV_model/issues/17#issuecomment-310432518, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABkC0WxHR33PNY_z_eUAUh-7qOM6jII2ks5sGpYcgaJpZM4LKw2n.
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.
Based 2016, which was the out of sample season, the latest trained model correctly identified 98% of the positive cases (118/120). The model's positive predictions were actually correct 57% of the time (118/208). Most of the false positives occur later in the season.
The 25% threshold used above represented the best balance of the confusion matrix elements given past years. Some years a lower threshold would have been more appropriate, for other years a higher threshold would have been better. Thresholds between 20% and 40% looked reasonable at different times, so this number is quite volatile.
NOTE: It's quite plausible that the sprays work, and they are a source of false positives. If we are predicting that you should spray, but you sprayed last week and knocked down the mosquito population, then maybe the only reason it's negative this week was because of the effect of the spray. We have not analyzed the effectiveness of spraying or the effect on the model.
date N TRUE_POS TRUE_NEG FALSE_NEG FALSE_POS TOTAL_ACTUAL_POS TOTAL_PRED_POS
2016-06-10 27 0 27 0 0 0 0
2016-06-17 62 0 61 0 1 0 1
2016-06-24 61 0 61 0 0 0 0
2016-07-01 54 0 54 0 0 0 0
2016-07-08 47 0 47 0 0 0 0
2016-07-15 55 2 50 0 3 2 5
2016-07-22 52 1 49 0 2 1 3
2016-07-29 66 4 60 0 2 4 6
2016-08-05 70 26 38 0 6 26 32
2016-08-12 67 29 29 0 9 29 38
2016-08-19 72 28 26 0 18 28 46
2016-08-26 61 11 31 0 19 11 30
2016-09-02 64 6 46 1 11 7 17
2016-09-09 68 6 53 1 8 7 14
2016-09-16 63 5 53 0 5 5 10
2016-09-23 66 0 60 0 6 0 6
2016-09-30 58 0 58 0 0 0 0
TOTAL 1013 118 803 2 90 120 208
@tomschenkjr Same thing but with a 40% threshold:
Based 2016, which was the out of sample season, the latest trained model correctly identified 78% of the positive cases (93/120). The model's positive predictions were actually correct 65% of the time (93/143).
date N TRUE_POS TRUE_NEG FALSE_NEG FALSE_POS TOTAL_ACTUAL_POS TOTAL_PRED_POS
2016-06-10 27 0 27 0 0 0 0
2016-06-17 62 0 62 0 0 0 0
2016-06-24 61 0 61 0 0 0 0
2016-07-01 54 0 54 0 0 0 0
2016-07-08 47 0 47 0 0 0 0
2016-07-15 55 1 53 1 0 2 1
2016-07-22 52 1 50 0 1 1 2
2016-07-29 66 4 60 0 2 4 6
2016-08-05 70 16 43 10 1 26 17
2016-08-12 67 27 29 2 9 29 36
2016-08-19 72 28 26 0 18 28 46
2016-08-26 61 8 44 3 6 11 14
2016-09-02 64 4 52 3 5 7 9
2016-09-09 68 3 59 4 2 7 5
2016-09-16 63 1 56 4 2 5 3
2016-09-23 66 0 62 0 4 0 4
2016-09-30 58 0 58 0 0 0 0
TOTAL 1013 93 843 27 50 120 143
I was going to say that he F Score is at the max at 25%, but actually it's pretty flat from about 25% to 37.5%. So, I was inspired to try a higher cutoff that was just below 40%, but the results were not much better.
The threshold of 37.5% gives us a model that correctly identified 78% of the positive cases (93/120). The model's positive predictions were actually correct 65% of the time (93/143).
Made an error see update below
Can you swing by to discuss?
From: Gene Leynes [mailto:notifications@github.com] Sent: Friday, June 23, 2017 2:05 PM To: Chicago/WNV_model WNV_model@noreply.github.com Cc: Schenk, Tom Tom.Schenk@cityofchicago.org; Mention mention@noreply.github.com Subject: Re: [Chicago/WNV_model] Human Readable Number (#17)
I was going to say that he F Score is at the max at 25%, but actually it's pretty flat from about 25% to 37.5%. So, I was inspired to try a higher cutoff that was just below 40%, but the results were not much better.
The threshold of 37.5% gives us a model that correctly identified 78% of the positive cases (93/120). The model's positive predictions were actually correct 65% of the time (93/143).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Chicago/WNV_model/issues/17#issuecomment-310748429, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABkC0TinAQ0mQA8sO8566QTb4c1N8dzXks5sHAxsgaJpZM4LKw2n.
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.
Not sure where the previous numbers came from, made a mistake. I added these measures to the evaluation code, so I know these results are consistent.
It happens that we have a bump at 40%, so 39% is a better story. The percent captured is 78% (94 / 120) and you're right 65% of the time (94 / 144).
This is what the tradeoff looks like:
Fantastic, I think we're discovering the optimal point. Closing this issue and milestone, but I will be reopening #25 to create the new links. Please see the new comments there for more.
BTW this is the precision / recall graph! I've got that graph in the diagnostics for future reference
Under the new model (I'm calling it m3 for tracking purposes) I'm getting really good fits. There are almost no false negatives (i.e. it picks up nearly every case), and so the only thing to communicate are the number of positives and the number of false positives.
I also think that if we use an aggressive cutoff then we don't really need to get into the confusing cutoff choice discussion.
Anyway, I think that we can start by showing the total cases that we're trying to predict:
Then I'd talk about the prediction: We used 2008 to 2015 to build the model. In 2016 we would have predicted that you should spray on each of the 116 days a week before you got the second positive result. We also would have predicted that you should spray on 104 days that turned out to be negative. :end:
As additional detail, in 2012, most of the false positives came late in the season, I'm not sure if this is worth communicating.