Test outputs do not match - ROC curve and performance metrics are outputting different results

m-makarious commented 4 years ago

Please make sure that this is a bug.

System information:

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Mojave v10.14.6
GenoML Installed from (source or binary): Source
GenoML Version: v1.0.0-b1
Python Version: 3.7

Describe the current behavior: When testing a model on an unseen dataset (after munging, training, and re-training on shared features)... the output results do not match between the *.testedModel_allSamples_performanceMetrics.csv and the .testedModel_allSamples_ROC.png (see screenshot below for an example of this)

Describe the expected behavior: ...they should match!

Code to reproduce the issue: Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Going through the training, harmonize, and testing steps outlined in the README will reproduce the issue. Attached is an image. Thanks for reporting this, @h-leonard!

Other Information / Logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

mikeDTI commented 4 years ago

Have you ever seen this happen with train?

m-makarious commented 4 years ago

Hey Mike, there was a similar-but-not-exactly-the-same issue where the best algorithm that was nominated in train was not the ROC curve that was being outputted.

This issue is closed now, but you can find more information here or if there are any additional issues: https://github.com/GenoML/genoml2/issues/9

Thinking there's a lot of similarities between the two issues, however!

mikeDTI commented 4 years ago

I think the bug is some algorithms are calculatting AUC using predict versus predict_proba in a few instances. Was only able to re-engineer this issue twice on major 5% difference using 'SGDClassifier' and one less than 0.5% difference using LinearDiscriminantAnalysis which in the latter case could have been a rounding error. I'll keep digging but if you could look into this as well @m-makarious would be great, thanks!!!

mikeDTI commented 4 years ago

We are defining the AUC in two different ways. I prefer using sklearn metrics default. See lines with rocauc = metrics.roc_auc_score(self.y_test, test_predictions) from sklearn instead of roc_auc = metrics.auc(fpr, tpr). Want me to redo the plots sticking to metrics.roc_auc_score? Luckily think this will solve the problem!

m-makarious commented 4 years ago

Sounds good to me! Let me know how that works out ☺️

mikeDTI commented 4 years ago

roc_auc = auc(fpr, tpr) needs to be changed to roc_auc = metrics.roc_auc_score(self.y_test, test_predictions)

m-makarious commented 4 years ago

Changed the test and train script to use the sklearn metrics default (roc_auc = metrics.roc_auc_score(self.y_test, test_predictions) in place of roc_auc = auc(fpr, tpr)), but this - at least on my end - did not resolve the issue of the inconsistency in reporting between the ROC and the perfomance metrics generated by the test script.

Will keep investigating!

mikeDTI commented 4 years ago

Weird. Let me know what you find. I’m doing some batch testing to see.

m-makarious commented 4 years ago

Perhaps an embarrassingly simple fix, but the following changes have been implemented:

Consistent AUC definition throughout GenoML, using the following: roc_auc = metrics.roc_auc_score(self.y_test, test_predictions) as @mikeDTI pointed out
The test portion of the script was re-computing AUC twice for some reason, but now it's computed once at the top and called again when needed, fixing the issue of misreporting since the same thing is referenced throughout the test script

Issue should be fixed now - but let me know if you run into additional issues @h-leonard !

Screenshot of (finally) consistent reporting:

mikeDTI commented 4 years ago

Yo! Great work!

GenoML / genoml2

Test outputs do not match - ROC curve and performance metrics are outputting different results #13