DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.28k stars 557 forks source link

DiscriminationThreshold seems to compute the minimum number of thresholds across trials, not max #697

Open aronszanto opened 5 years ago

aronszanto commented 5 years ago

Presumably the following code should find the max number of thresholds found across all the trials so as not to ignore the values in the densest threshold list across all trials:

# Compute maximum number of uniform thresholds across all trials
n_thresholds = np.array([len(t['thresholds']) for t in trials]).min()
self.thresholds_ = np.linspace(0.0, 1.0, num=n_thresholds)

But this finds the minimum. So we may be losing information by bisecting with the sparsest number of thresholds rather than the densest.

lwgray commented 5 years ago

@aronszanto Thanks for bringing this to our attention. For us to move forward, we require more information. You appear to be examining lines 230 - 232 in the threshold.py file. https://github.com/DistrictDataLabs/yellowbrick/blob/f8f96d20e69b4b9f86e4f986fdb9edb54448eea9/yellowbrick/classifier/threshold.py#L230-L232

Could you give an example with what you're getting for the thresholds? What do you expect to get?

aronszanto commented 5 years ago

So consider the following thresholds with one metric's scores between them for each of 3 CV folds (or three trials with 1 CV fold each. (With apologies for formatting)

[0]---.1----.25--------1 = .5 - .9 - .4

[0]-.05----------------1 = .4 - .8

[0]--.1----.2-----.6---.8--1 = .3 - .95 - .7 - .6 - .5

Then currently, n_thresholds = np.array([4, 3, 5]).min() = 3 Which means that self.thresholds_ = [0, .5, 1]. However, when we evaluate median scores at 0, .5, and 1, we get that the scores are .4, .7, and .5, deducing that the best threshold is .5. However, were we to have evaluated with more granularity (e.g., with n_thresholds = np.array([4, 3, 5]).max() = 5, we would evaluate median scores at [0, .25, .5, .75, 1], getting scores of .4, .8, .7, .6, .5, finding that the best threshold is .25. This still isn't enough, because it throws away important information captured in thresholds narrower than the uniform spacing induced by linspace. I think a better thing to do is to set self._thresholds_ to np.sort(np.concat([t['thresholds'] for t in trials])), which ensures that you evaluate each trial's scores at every threshold, not throwing away any information. Were we to do this, we would find that the optimal threshold is actually .2, with a median score of .9. Note that this score would never be obtained unless we evaluated at a threshold between .1 and .2, which wouldn't happen using the linspace paradigm until n_thresholds reached at least 6. Since there's no way to know if there's an optimal threshold range until you evaluate them all (and it's a fairly efficient task, anyway), we should consider doing exactly that.

bbengfort commented 5 years ago

@aronszanto I'm really sorry we've fallen behind on this issue - we've been slammed with PRs and requests, but it's definitely important to us! Would you be interested in opening a PR to fix this issue by any chance?

rebeccabilbro commented 5 years ago

Bump @aronszanto! Any interest in submitting a PR to fix the bug? We're a pretty small team over here and would be most appreciative!

aronszanto commented 5 years ago

Hey all, thanks for bumping. I discovered this and then wrote a fix for my company, which means that whether I can open source it is up to the lawyers. Working on this :)

bbengfort commented 5 years ago

@aronszanto glad to hear that you're using Yellowbrick in your work! Do you mind if I ask where you work? (We're just interested in how our users are using the library). Feel free to send me a DM on twitter (@bbengfort) if you want to chat about it!

If we can be of any help with your company lawyers, please let us know! Yellowbrick is licensed under a permissive Apache 2 open source license which means you can absolutely use it for commercial use and make modifications to the software. All we ask is that you include our copyright, license, and any notices along with your fork of the code. If you make any major changes to the code, we'd also ask that we were notified.

Hopefully your managers will agree with me that contributing to open source projects like YB enhances the value of your work and will let you open a PR with the fix you described. If they don't, we understand - we are all volunteers doing this in our spare time and also have full time jobs that let us contribute more or less to other open source projects!