GenoML / genoml2

GenoML (genoml2) is an open source Python package. It is an automated machine learning (autoML) platform for genomics data
Apache License 2.0
27 stars 17 forks source link

Test outputs do not match - ROC curve and performance metrics are outputting different results #13

Closed m-makarious closed 4 years ago

m-makarious commented 4 years ago

Please make sure that this is a bug.

System information:

Describe the current behavior: When testing a model on an unseen dataset (after munging, training, and re-training on shared features)... the output results do not match between the *.testedModel_allSamples_performanceMetrics.csv and the .testedModel_allSamples_ROC.png (see screenshot below for an example of this)

Describe the expected behavior: ...they should match!

Code to reproduce the issue: Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Going through the training, harmonize, and testing steps outlined in the README will reproduce the issue. Attached is an image. Thanks for reporting this, @h-leonard!

image

Other Information / Logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

mikeDTI commented 4 years ago

Have you ever seen this happen with train?

m-makarious commented 4 years ago

Hey Mike, there was a similar-but-not-exactly-the-same issue where the best algorithm that was nominated in train was not the ROC curve that was being outputted.

This issue is closed now, but you can find more information here or if there are any additional issues: https://github.com/GenoML/genoml2/issues/9

Thinking there's a lot of similarities between the two issues, however!

mikeDTI commented 4 years ago

I think the bug is some algorithms are calculatting AUC using predict versus predict_proba in a few instances. Was only able to re-engineer this issue twice on major 5% difference using 'SGDClassifier' and one less than 0.5% difference using LinearDiscriminantAnalysis which in the latter case could have been a rounding error. I'll keep digging but if you could look into this as well @m-makarious would be great, thanks!!!

mikeDTI commented 4 years ago

We are defining the AUC in two different ways. I prefer using sklearn metrics default. See lines with rocauc = metrics.roc_auc_score(self.y_test, test_predictions) from sklearn instead of roc_auc = metrics.auc(fpr, tpr). Want me to redo the plots sticking to metrics.roc_auc_score? Luckily think this will solve the problem!

m-makarious commented 4 years ago

Sounds good to me! Let me know how that works out ☺️

mikeDTI commented 4 years ago

roc_auc = auc(fpr, tpr) needs to be changed to roc_auc = metrics.roc_auc_score(self.y_test, test_predictions)

m-makarious commented 4 years ago

Changed the test and train script to use the sklearn metrics default (roc_auc = metrics.roc_auc_score(self.y_test, test_predictions) in place of roc_auc = auc(fpr, tpr)), but this - at least on my end - did not resolve the issue of the inconsistency in reporting between the ROC and the perfomance metrics generated by the test script.

Will keep investigating!

mikeDTI commented 4 years ago

Weird. Let me know what you find. I’m doing some batch testing to see.

m-makarious commented 4 years ago

Perhaps an embarrassingly simple fix, but the following changes have been implemented:

Issue should be fixed now - but let me know if you run into additional issues @h-leonard !


Screenshot of (finally) consistent reporting: image

mikeDTI commented 4 years ago

Yo! Great work!