Human Readable Number - Githubissues

geneorama commented 7 years ago

Under the new model (I'm calling it m3 for tracking purposes) I'm getting really good fits. There are almost no false negatives (i.e. it picks up nearly every case), and so the only thing to communicate are the number of positives and the number of false positives.

I also think that if we use an aggressive cutoff then we don't really need to get into the confusing cutoff choice discussion.

Anyway, I think that we can start by showing the total cases that we're trying to predict:

     grp WNV_Cases WNV_Cases_twice_in_a_row
year2008        60                        9
year2009        17                        0
year2010        65                       12
year2011        50                        8
year2012       231                      120
year2013       190                       61
year2014       165                       58
year2015       100                       33
year2016       240                      116

Then I'd talk about the prediction: We used 2008 to 2015 to build the model. In 2016 we would have predicted that you should spray on each of the 116 days a week before you got the second positive result. We also would have predicted that you should spray on 104 days that turned out to be negative. :end:

As additional detail, in 2012, most of the false positives came late in the season, I'm not sure if this is worth communicating.

      date  N FALSE_POS TRUE_POS TOT_POS
2016-06-10 27         0        0       0
2016-06-17 61         1        0       1
2016-06-24 59         0        0       0
2016-07-01 52         0        0       0
2016-07-08 46         1        0       1
2016-07-15 52         3        2       5
2016-07-22 51         2        1       3
2016-07-29 64         2        3       5
2016-08-05 67         6       25      31
2016-08-12 64         9       28      37
2016-08-19 69        17       27      44
2016-08-26 58        23       11      34
2016-09-02 62        13        7      20
2016-09-09 67        10        7      17
2016-09-16 62        11        5      16
2016-09-23 64         6        0       6
2016-09-30 57         0        0       0

tomschenkjr commented 7 years ago

Thanks. 104 days of false positives is quite high. That is, it is wrong nearly as much as it is right. There is a lot of political cost with spraying, so we need to think about ways to minimize this. Perhaps even at the cost of adding more false negatives.

geneorama commented 7 years ago

I reran the model based on the business rule of "no sprays after labor day".

I'm calling "sprays" dates in which a single trap came back positive two weeks in a row. I simply deleted all the rows after labor day before I put the data into the model then I retrained the model.

After taking this approach the AUC is 96% for 2016, in which there just over 108 sprays, but there are still several false positives. Here's how the season would have unfolded with a cutoff of 20%:

      date   N FALSE_POS TRUE_POS TOT_POS
2016-06-10  27         0        0       0
2016-06-17  62         1        0       1
2016-06-24  61         0        0       0
2016-07-01  54         0        0       0
2016-07-08  47         0        0       0
2016-07-15  55         3        2       5
2016-07-22  52         2        1       3
2016-07-29  66         2        4       6
2016-08-05  70         6       26      32
2016-08-12  67         9       29      38
2016-08-19  72        18       28      46
2016-08-26  61        24       11      35
2016-09-02  64        11        6      17
     TOTAL 758        76      107     183

The full confusion matrix results:

     r true_pos true_neg false_neg false_pos sensitivity specificity      recall precision   fmeasure
 0.000      108        0         0       650 1.000000000   0.0000000 1.000000000 0.1424802 0.24942263
 0.025      108      568         0        82 1.000000000   0.8738462 1.000000000 0.5684211 0.72483221
 0.050      108      568         0        82 1.000000000   0.8738462 1.000000000 0.5684211 0.72483221
 0.075      108      568         0        82 1.000000000   0.8738462 1.000000000 0.5684211 0.72483221
 0.100      108      570         0        80 1.000000000   0.8769231 1.000000000 0.5744681 0.72972973
 0.125      108      570         0        80 1.000000000   0.8769231 1.000000000 0.5744681 0.72972973
 0.150      108      572         0        78 1.000000000   0.8800000 1.000000000 0.5806452 0.73469388
 0.175      107      573         1        77 0.990740741   0.8815385 0.990740741 0.5815217 0.73287671
 0.200      107      574         1        76 0.990740741   0.8830769 0.990740741 0.5846995 0.73539519
 0.225      107      579         1        71 0.990740741   0.8907692 0.990740741 0.6011236 0.74825175
 0.250      106      582         2        68 0.981481481   0.8953846 0.981481481 0.6091954 0.75177305
 0.275      105      586         3        64 0.972222222   0.9015385 0.972222222 0.6213018 0.75812274
 0.300      101      590         7        60 0.935185185   0.9076923 0.935185185 0.6273292 0.75092937
 0.325       99      592         9        58 0.916666667   0.9107692 0.916666667 0.6305732 0.74716981
 0.350       97      599        11        51 0.898148148   0.9215385 0.898148148 0.6554054 0.75781250
 0.375       92      607        16        43 0.851851852   0.9338462 0.851851852 0.6814815 0.75720165
 0.400       86      611        22        39 0.796296296   0.9400000 0.796296296 0.6880000 0.73819742
 0.425       80      614        28        36 0.740740741   0.9446154 0.740740741 0.6896552 0.71428571
 0.450       74      619        34        31 0.685185185   0.9523077 0.685185185 0.7047619 0.69483568
 0.475       63      621        45        29 0.583333333   0.9553846 0.583333333 0.6847826 0.63000000
 0.500       57      624        51        26 0.527777778   0.9600000 0.527777778 0.6867470 0.59685864
 0.525       50      628        58        22 0.462962963   0.9661538 0.462962963 0.6944444 0.55555556
 0.550       42      631        66        19 0.388888889   0.9707692 0.388888889 0.6885246 0.49704142
 0.575       40      633        68        17 0.370370370   0.9738462 0.370370370 0.7017544 0.48484848
 0.600       37      634        71        16 0.342592593   0.9753846 0.342592593 0.6981132 0.45962733
 0.625       31      636        77        14 0.287037037   0.9784615 0.287037037 0.6888889 0.40522876
 0.650       27      638        81        12 0.250000000   0.9815385 0.250000000 0.6923077 0.36734694
 0.675       22      640        86        10 0.203703704   0.9846154 0.203703704 0.6875000 0.31428571
 0.700       12      643        96         7 0.111111111   0.9892308 0.111111111 0.6315789 0.18897638
 0.725        7      646       101         4 0.064814815   0.9938462 0.064814815 0.6363636 0.11764706
 0.750        6      648       102         2 0.055555556   0.9969231 0.055555556 0.7500000 0.10344828
 0.775        1      650       107         0 0.009259259   1.0000000 0.009259259 1.0000000 0.01834862
 0.800        0      650       108         0 0.000000000   1.0000000 0.000000000       NaN        NaN
 0.825        0      650       108         0 0.000000000   1.0000000 0.000000000       NaN        NaN
 0.850        0      650       108         0 0.000000000   1.0000000 0.000000000       NaN        NaN
 0.875        0      650       108         0 0.000000000   1.0000000 0.000000000       NaN        NaN
 0.900        0      650       108         0 0.000000000   1.0000000 0.000000000       NaN        NaN
 0.925        0      650       108         0 0.000000000   1.0000000 0.000000000       NaN        NaN
 0.950        0      650       108         0 0.000000000   1.0000000 0.000000000       NaN        NaN
 0.975        0      650       108         0 0.000000000   1.0000000 0.000000000       NaN        NaN
 1.000        0      650       108         0 0.000000000   1.0000000 0.000000000       NaN        NaN

geneorama commented 7 years ago

@tomschenkjr I"m going to edit the previous comment to match (i.e. include totals and full confusion matrix).

geneorama commented 7 years ago

@tomschenkjr Actually I'm not going to edit the original because it changed slightly (probably because I changed the feature calculations). So, I'm posting the previous result below. This is the same as above but with sprays after labor day:

       date    N FALSE_POS TRUE_POS TOT_POS
 2016-06-10   27         0        0       0
 2016-06-17   62         1        0       1
 2016-06-24   61         0        0       0
 2016-07-01   54         0        0       0
 2016-07-08   47         0        0       0
 2016-07-15   55         3        2       5
 2016-07-22   52         2        1       3
 2016-07-29   66         2        4       6
 2016-08-05   70         6       26      32
 2016-08-12   67         9       29      38
 2016-08-19   72        18       28      46
 2016-08-26   61        24       11      35
 2016-09-02   64        13        6      19
 2016-09-09   68        10        7      17
 2016-09-16   63        12        5      17
 2016-09-23   66         6        0       6
 2016-09-30   58         0        0       0
      TOTAL 1013       106      119     225

Full confusion matrix:

     r true_pos true_neg false_neg false_pos sensitivity specificity     recall precision  fmeasure
 0.000      120        0         0       893  1.00000000   0.0000000 1.00000000 0.1184600 0.2118270
 0.025      120      770         0       123  1.00000000   0.8622620 1.00000000 0.4938272 0.6611570
 0.050      120      770         0       123  1.00000000   0.8622620 1.00000000 0.4938272 0.6611570
 0.075      120      773         0       120  1.00000000   0.8656215 1.00000000 0.5000000 0.6666667
 0.100      120      775         0       118  1.00000000   0.8678611 1.00000000 0.5042017 0.6703911
 0.125      120      775         0       118  1.00000000   0.8678611 1.00000000 0.5042017 0.6703911
 0.150      120      780         0       113  1.00000000   0.8734602 1.00000000 0.5150215 0.6798867
 0.175      120      785         0       108  1.00000000   0.8790594 1.00000000 0.5263158 0.6896552
 0.200      119      787         1       106  0.99166667   0.8812990 0.99166667 0.5288889 0.6898551
 0.225      118      797         2        96  0.98333333   0.8924972 0.98333333 0.5514019 0.7065868
 0.250      117      802         3        91  0.97500000   0.8980963 0.97500000 0.5625000 0.7134146
 0.275      117      808         3        85  0.97500000   0.9048152 0.97500000 0.5792079 0.7267081
 0.300      111      812         9        81  0.92500000   0.9092945 0.92500000 0.5781250 0.7115385
 0.325      109      819        11        74  0.90833333   0.9171333 0.90833333 0.5956284 0.7194719
 0.350      104      828        16        65  0.86666667   0.9272116 0.86666667 0.6153846 0.7197232
 0.375       97      833        23        60  0.80833333   0.9328108 0.80833333 0.6178344 0.7003610
 0.400       93      844        27        49  0.77500000   0.9451288 0.77500000 0.6549296 0.7099237
 0.425       85      849        35        44  0.70833333   0.9507279 0.70833333 0.6589147 0.6827309
 0.450       77      854        43        39  0.64166667   0.9563270 0.64166667 0.6637931 0.6525424
 0.475       70      857        50        36  0.58333333   0.9596865 0.58333333 0.6603774 0.6194690
 0.500       55      866        65        27  0.45833333   0.9697648 0.45833333 0.6707317 0.5445545
 0.525       50      869        70        24  0.41666667   0.9731243 0.41666667 0.6756757 0.5154639
 0.550       44      872        76        21  0.36666667   0.9764838 0.36666667 0.6769231 0.4756757
 0.575       37      875        83        18  0.30833333   0.9798432 0.30833333 0.6727273 0.4228571
 0.600       34      878        86        15  0.28333333   0.9832027 0.28333333 0.6938776 0.4023669
 0.625       30      879        90        14  0.25000000   0.9843225 0.25000000 0.6818182 0.3658537
 0.650       25      881        95        12  0.20833333   0.9865622 0.20833333 0.6756757 0.3184713
 0.675       18      886       102         7  0.15000000   0.9921613 0.15000000 0.7200000 0.2482759
 0.700       12      889       108         4  0.10000000   0.9955207 0.10000000 0.7500000 0.1764706
 0.725        7      891       113         2  0.05833333   0.9977604 0.05833333 0.7777778 0.1085271
 0.750        3      892       117         1  0.02500000   0.9988802 0.02500000 0.7500000 0.0483871
 0.775        0      893       120         0  0.00000000   1.0000000 0.00000000       NaN       NaN
 0.800        0      893       120         0  0.00000000   1.0000000 0.00000000       NaN       NaN
 0.825        0      893       120         0  0.00000000   1.0000000 0.00000000       NaN       NaN
 0.850        0      893       120         0  0.00000000   1.0000000 0.00000000       NaN       NaN
 0.875        0      893       120         0  0.00000000   1.0000000 0.00000000       NaN       NaN
 0.900        0      893       120         0  0.00000000   1.0000000 0.00000000       NaN       NaN
 0.925        0      893       120         0  0.00000000   1.0000000 0.00000000       NaN       NaN
 0.950        0      893       120         0  0.00000000   1.0000000 0.00000000       NaN       NaN
 0.975        0      893       120         0  0.00000000   1.0000000 0.00000000       NaN       NaN
 1.000        0      893       120         0  0.00000000   1.0000000 0.00000000       NaN       NaN

geneorama commented 7 years ago

So in our email thread you suggested using a higher cutoff value; e.g. 50%. I think this is a good idea.

Also, I'm sorry that I didn't include the full confusion matrix results originally, I obviously calculated them, but I thought it was too verbose. Turns out that we need the detail. Also (by the way) I thought you meant "confusion matrix for one choice of cutoff", but really I think we want to see a variety of cutoffs. Can you confirm?

tomschenkjr commented 7 years ago

@geneorama - yeah, the full variety of cutoffs is useful here.

I think the principal question is asking the client what their threshold of pain is for false positives. That is, how many are too many?

Are you working on any other strategies for model performance at the moment?

geneorama commented 7 years ago

@tomschenkjr not working on other model performance strategies right now. I was looking at the variables and my biggest concern right now is that the performance looks too good to be true.

tomschenkjr commented 7 years ago

@geneorama - is this done yet?

geneorama commented 7 years ago

I'm happy to say that "the model is right about 90% of the time for predicting the presence of WNV 1 week in advance" or "this model gives us about a 1 week lead time on WNV spray decisions".

tomschenkjr commented 7 years ago

Come by my office now.

From: Gene Leynes [mailto:notifications@github.com] Sent: Thursday, June 22, 2017 11:28 AM To: Chicago/WNV_model WNV_model@noreply.github.com Cc: Schenk, Tom Tom.Schenk@cityofchicago.org; Mention mention@noreply.github.com Subject: Re: [Chicago/WNV_model] Human Readable Number (#17)

I'm happy to say that "the model is right about 90% of the time for predicting the presence of WNV 1 week in advance" or "this model gives us about a 1 week lead time on WNV spray decisions".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Chicago/WNV_model/issues/17#issuecomment-310432518, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABkC0WxHR33PNY_z_eUAUh-7qOM6jII2ks5sGpYcgaJpZM4LKw2n.

This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.

geneorama commented 7 years ago

Based 2016, which was the out of sample season, the latest trained model correctly identified 98% of the positive cases (118/120). The model's positive predictions were actually correct 57% of the time (118/208). Most of the false positives occur later in the season.

The 25% threshold used above represented the best balance of the confusion matrix elements given past years. Some years a lower threshold would have been more appropriate, for other years a higher threshold would have been better. Thresholds between 20% and 40% looked reasonable at different times, so this number is quite volatile.

NOTE: It's quite plausible that the sprays work, and they are a source of false positives. If we are predicting that you should spray, but you sprayed last week and knocked down the mosquito population, then maybe the only reason it's negative this week was because of the effect of the spray. We have not analyzed the effectiveness of spraying or the effect on the model.

       date    N TRUE_POS TRUE_NEG FALSE_NEG FALSE_POS TOTAL_ACTUAL_POS TOTAL_PRED_POS
 2016-06-10   27        0       27         0         0                0              0
 2016-06-17   62        0       61         0         1                0              1
 2016-06-24   61        0       61         0         0                0              0
 2016-07-01   54        0       54         0         0                0              0
 2016-07-08   47        0       47         0         0                0              0
 2016-07-15   55        2       50         0         3                2              5
 2016-07-22   52        1       49         0         2                1              3
 2016-07-29   66        4       60         0         2                4              6
 2016-08-05   70       26       38         0         6               26             32
 2016-08-12   67       29       29         0         9               29             38
 2016-08-19   72       28       26         0        18               28             46
 2016-08-26   61       11       31         0        19               11             30
 2016-09-02   64        6       46         1        11                7             17
 2016-09-09   68        6       53         1         8                7             14
 2016-09-16   63        5       53         0         5                5             10
 2016-09-23   66        0       60         0         6                0              6
 2016-09-30   58        0       58         0         0                0              0
      TOTAL 1013      118      803         2        90              120            208

geneorama commented 7 years ago

@tomschenkjr Same thing but with a 40% threshold:

Based 2016, which was the out of sample season, the latest trained model correctly identified 78% of the positive cases (93/120). The model's positive predictions were actually correct 65% of the time (93/143).

       date    N TRUE_POS TRUE_NEG FALSE_NEG FALSE_POS TOTAL_ACTUAL_POS TOTAL_PRED_POS
 2016-06-10   27        0       27         0         0                0              0
 2016-06-17   62        0       62         0         0                0              0
 2016-06-24   61        0       61         0         0                0              0
 2016-07-01   54        0       54         0         0                0              0
 2016-07-08   47        0       47         0         0                0              0
 2016-07-15   55        1       53         1         0                2              1
 2016-07-22   52        1       50         0         1                1              2
 2016-07-29   66        4       60         0         2                4              6
 2016-08-05   70       16       43        10         1               26             17
 2016-08-12   67       27       29         2         9               29             36
 2016-08-19   72       28       26         0        18               28             46
 2016-08-26   61        8       44         3         6               11             14
 2016-09-02   64        4       52         3         5                7              9
 2016-09-09   68        3       59         4         2                7              5
 2016-09-16   63        1       56         4         2                5              3
 2016-09-23   66        0       62         0         4                0              4
 2016-09-30   58        0       58         0         0                0              0
      TOTAL 1013       93      843        27        50              120            143

geneorama commented 7 years ago

I was going to say that he F Score is at the max at 25%, but actually it's pretty flat from about 25% to 37.5%. So, I was inspired to try a higher cutoff that was just below 40%, but the results were not much better.

~~The threshold of 37.5% gives us a model that correctly identified 78% of the positive cases (93/120). The model's positive predictions were actually correct 65% of the time (93/143).~~

Made an error see update below

tomschenkjr commented 7 years ago

Can you swing by to discuss?

From: Gene Leynes [mailto:notifications@github.com] Sent: Friday, June 23, 2017 2:05 PM To: Chicago/WNV_model WNV_model@noreply.github.com Cc: Schenk, Tom Tom.Schenk@cityofchicago.org; Mention mention@noreply.github.com Subject: Re: [Chicago/WNV_model] Human Readable Number (#17)

I was going to say that he F Score is at the max at 25%, but actually it's pretty flat from about 25% to 37.5%. So, I was inspired to try a higher cutoff that was just below 40%, but the results were not much better.

The threshold of 37.5% gives us a model that correctly identified 78% of the positive cases (93/120). The model's positive predictions were actually correct 65% of the time (93/143).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Chicago/WNV_model/issues/17#issuecomment-310748429, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABkC0TinAQ0mQA8sO8566QTb4c1N8dzXks5sHAxsgaJpZM4LKw2n.

This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.

geneorama commented 7 years ago

Not sure where the previous numbers came from, made a mistake. I added these measures to the evaluation code, so I know these results are consistent.

It happens that we have a bump at 40%, so 39% is a better story. The percent captured is 78% (94 / 120) and you're right 65% of the time (94 / 144).

This is what the tradeoff looks like:

tomschenkjr commented 7 years ago

Fantastic, I think we're discovering the optimal point. Closing this issue and milestone, but I will be reopening #25 to create the new links. Please see the new comments there for more.

geneorama commented 7 years ago

BTW this is the precision / recall graph! I've got that graph in the diagnostics for future reference

Chicago / west-nile-virus-predictions

Human Readable Number #17