NannyML / nannyml

nannyml: post-deployment data science in python
https://www.nannyml.com/
Apache License 2.0
1.97k stars 139 forks source link

Fix handling single class in chunk for CBPE #384

Closed michael-nml closed 6 months ago

michael-nml commented 6 months ago

This PR fixes an error when calculating business value, confusion matrix & specificity for binary classification problems where a chunk only contains 1 class.

Previously this would fail with:

nannyml.exceptions.CalculatorException: failed while fitting nannyml.performance_estimation.confidence_based.cbpe.CBPE. not enough values to unpack (expected 4, got 1)

This happens because the sklearn.metrics.confusion_matrix function NannyML uses internally bases its output on the number of classes present in the input. If only a single class is present, only 1 value is returned where we normally expect 4 for a binary classification problem. This PR resolves this by explicitly providing the expected classes in the labels argument. These expected classes are currently hard-coded as [0, 1] but we may want to change this to derive values from the input if/when we support string-based classes for binary classification.

Additionally, this PR resolves an issue with F1 sampling error calculation when there are no positive cases present in the input. This previously resulted in a ZeroDivisionError. Now it resolves the NaN sampling error.

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 78.67%. Comparing base (13ace29) to head (730ae35).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #384 +/- ## ========================================== + Coverage 78.52% 78.67% +0.15% ========================================== Files 110 110 Lines 8562 8567 +5 Branches 1522 1523 +1 ========================================== + Hits 6723 6740 +17 + Misses 1476 1468 -8 + Partials 363 359 -4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.