Feat/image data stable - Githubissues

esherman-credo commented 1 year ago

Describe your changes

This is a feature branch implementing support for tensor-based data and neural network models. For Credo-internal developers, the proposed changes are detailed here.

Summary of Changes (in progress, more to come):

BaseModel
- Group init functionality in process_model function
- Split validation functionality into 2 functions:
  1. Checks that required callable functions are present (existing functionality)
  2. Checks that various details of the framework match up with the model wrapper type; throws warnings rather than errors
ClassificationModel
- Add support for Keras-style classifiers (highly restrictive, w/ warnings if assumptions are not met)
  - Assume Sequential-style model w/ Dense last (or 2nd to last) layer + (softmax or sigmoid)
  - Implement __post_init__() functionality for keras predictions
    - Depends on sigmoid (label outputs; no predict_proba) vs. softmax (probability outputs; predict_proba by default -> can infer predict) -DummyClassifier
- Passing X + y artifacts to data wrapper when they're already wrapped
- If X and y are separate in the user's non-Lens workflow, passing these to Lens will be done in the same way as for TabularData
- If X and y are wrapped in one object (such as a tf.data or keras.utils.Sequence object) then passing them to the Lens wrapper requires separating them
- Change: Add ability to pass model_like to DummyClassifier
  - This does not fully address usability issue above (e.g., if user wants to run dataset assessments on X and y)
  - Allows user to avoid excessive computation by Lens, while still retaining model details for, e.g., the ModelProfiler evaluator
Base Data
- Converted process functions to non-abstract (committed)
  - Can support tensor data (from Keras + TF, at least; see supported inputs to predict here) with base Data class
  - Working example of data-on-disk model fitting + Lens evaluation exists (not pushed to GH) Evaluator Validation
Security evaluator needed to be refactored (a bit) to use ART's TensorflowV2Classifier rather than their KerasClassifier
- The latter does not support eager execution
- The former is ART's current workaround -> eager execution is important for overall TF/Keras support (allows running things on-demand rather than as a batched graph execution)
- Tests passing

Lens Validation Validation of Model + Data --> At Lens init stage, we now verify that predict, predict_proba, and compare (whichever are relevant to the provided model) works for the provided data. Throws an error and prevents instantiating/running evaluators without first checking Model + Data compatibility.

Evaluator Validation Established a starting point for streamlining/unifying artifact checking. Converted check_artifact_for_nulls to check_data... (name reflects what it's doing) and added options to only check some parts of the artifact (i.e., a subset of X, y, and sensitive_features) for nulls rather than checking all parts. Functionality doesn't fundamentally change but makes requirements more explicit from function arguments: check_X, check_y, and check_sens (all boolean).

If datatype is completely arbitrary or generator-like, we have no way of checking; need the user to do this before wrapping
This revised null-checker expands capabilities to several non-Pandas types

Issue ticket number and link

https://credo-ai.atlassian.net/browse/DSP-344

Known outstanding issues that are not fully accounted for

Need to fix/merge the Keras support in the ModelProfiler
Tests! Tests! Tests!
Confirm documentation is properly built

Checklist before requesting a review

[x] I have performed a self-review of my code
[ ] I have built basic tests for new functionality (particularly new evaluators)
[x] If new libraries have been added, I have checked that readthedocs API documentation is constructed correctly
[x] Will this be part of a major product update? If yes, please write one phrase about this update.
- See above. This expands Lens functionality to support Keras models (and, by default, some TF models). This remains relatively experimental -> the space of models and data that "work" is likely much larger than what we have explicitly developed for. We have tried to implement warnings where possible. Some growing pains are likely.

Extra-mile Checklist

[ ] I have thought expansively about edge cases and written tests for them

github-actions[bot] commented 1 year ago

Coverage Report

File	Stmts	Miss	Cover	Missing
credoai
__init__.py	3	0	100%
credoai/artifacts
__init__.py	7	0	100%
credoai/artifacts/data
__init__.py	0	0	100%
base_data.py	106	13	88%	53, 153, 156, 171, 178, 185, 189, 194, 197, 200, 212, 215, 222
comparison_data.py	63	13	79%	53, 60, 71, 76, 81, 90, 96, 100, 105, 114, 147, 153, 156
tabular_data.py	42	6	86%	52, 76, 80, 99, 101, 108
credoai/artifacts/model
__init__.py	0	0	100%
base_model.py	42	2	95%	57, 103
classification_model.py	48	18	62%	69–72, 88–120
comparison_model.py	11	0	100%
constants_model.py	5	0	100%
regression_model.py	11	4	64%	41–43, 46
credoai/evaluators
__init__.py	15	0	100%
data_fairness.py	160	13	92%	85–92, 100, 225, 252, 282–294, 411, 446–447
data_profiler.py	61	4	93%	49, 73–74, 93
deepchecks.py	40	3	92%	113–122
equity.py	113	6	95%	73, 153–155, 226–227
evaluator.py	72	6	92%	67, 70, 89, 115, 180, 187
fairness.py	111	2	98%	111, 224
feature_drift.py	59	1	98%	66
identity_verification.py	112	2	98%	144–145
model_profiler.py	103	32	69%	95–101, 117–130, 158–161, 174–179, 182–214, 256–257, 266–267, 305
performance.py	87	7	92%	108, 129–135
privacy.py	118	4	97%	410, 447–449
ranking_fairness.py	112	14	88%	144–145, 165, 184, 190–191, 387–409, 414–444
security.py	97	1	99%	309
shap.py	87	14	84%	117, 125–126, 136–142, 168–169, 251–252, 282–290
survival_fairness.py	67	50	25%	27–31, 34–46, 51–62, 65–76, 79–97, 100, 103, 106
credoai/evaluators/utils
__init__.py	3	0	100%
fairlearn.py	18	1	94%	93
utils.py	8	1	88%	9
validation.py	87	25	71%	23, 43–44, 46–48, 55, 65, 67, 71–76, 89, 92, 95, 98–99, 116–123, 129–135, 138
credoai/governance
__init__.py	1	0	100%
credoai/lens
__init__.py	2	0	100%
lens.py	206	13	94%	59, 201–202, 238–243, 300, 342, 366, 448, 463, 467, 479
lens_validation.py	70	32	54%	41, 45, 49–51, 63, 66, 71–75, 84, 89–92, 119, 122–140, 168–170
pipeline_creator.py	60	12	80%	20–21, 37, 79–91
utils.py	39	28	28%	20–27, 49–52, 71–82, 99, 106–109, 128–135
credoai/modules
__init__.py	3	0	100%
constants_deepchecks.py	2	0	100%
constants_metrics.py	19	0	100%
constants_threshold_metrics.py	3	0	100%
metric_utils.py	24	18	25%	15–30, 34–55
metrics.py	88	13	85%	63, 67, 70–71, 74, 84, 123, 135–140, 178, 185, 187
metrics_credoai.py	167	49	71%	68–69, 73, 93–102, 107–109, 132–160, 176–179, 206, 230–231, 294–296, 372–378, 414–415, 485–486, 534, 638
stats.py	97	50	48%	15–18, 21–26, 29–31, 34–39, 42–56, 59–64, 106, 132–159, 191, 202–217
stats_utils.py	5	3	40%	5–8
credoai/prism
__init__.py	3	0	100%
compare.py	35	2	94%	71, 87
prism.py	36	4	89%	46, 48, 59, 86
task.py	17	2	88%	30, 37
credoai/prism/comparators
__init_.py	0	0	100%
comparator.py	17	3	82%	34, 42, 47
metric_comparator.py	44	2	95%	125, 131
credoai/utils
__init__.py	5	0	100%
common.py	104	33	68%	55, 72–73, 79, 88–95, 106–107, 124–130, 135, 140–145, 156–163, 190
constants.py	2	0	100%
dataset_utils.py	61	35	43%	23, 26–31, 50, 54–55, 88–119
logging.py	55	13	76%	10–11, 14, 19–20, 23, 27, 44, 58–62
model_utils.py	73	46	37%	17–22, 32–33, 36–37, 42–47, 63–108, 114–121
version_check.py	11	1	91%	16
TOTAL	3117	601	81%

esherman-credo commented 1 year ago

Linking the per-evaluator validation requirements. I'm shooting to have this implemented such that one can trivially read an evaluator's assumptions/requirements from the validate_artifacts function. E.g. in below screenshot from the updated Performance evaluator, we can see that the performance evaluator has the requirements:

The metrics, X, and y supplied to the evaluator all need to be non-null (as objects)
X and y also need to not contain object (i.e. the internals)

Moreover, we can see that sensitive features doesn't need to be checked for nullness, which is a hint that that sub-artifact isn't used in this evaluator (not otherwise obvious without looking carefully through the code or making an assumption based on the evalutor's docstring).

credo-ai / credoai_lens

Feat/image data stable #290

Describe your changes

Issue ticket number and link

Known outstanding issues that are not fully accounted for

Checklist before requesting a review

Extra-mile Checklist