credo-ai / credoai_lens

Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central gateway to assessments created in the open source community.
https://credoai-lens.readthedocs.io/en/stable/
Apache License 2.0
46 stars 8 forks source link

Feat/image data stable #290

Closed esherman-credo closed 1 year ago

esherman-credo commented 1 year ago

Describe your changes

This is a feature branch implementing support for tensor-based data and neural network models. For Credo-internal developers, the proposed changes are detailed here.

Summary of Changes (in progress, more to come):

Lens Validation Validation of Model + Data --> At Lens init stage, we now verify that predict, predict_proba, and compare (whichever are relevant to the provided model) works for the provided data. Throws an error and prevents instantiating/running evaluators without first checking Model + Data compatibility.

Evaluator Validation Established a starting point for streamlining/unifying artifact checking. Converted check_artifact_for_nulls to check_data... (name reflects what it's doing) and added options to only check some parts of the artifact (i.e., a subset of X, y, and sensitive_features) for nulls rather than checking all parts. Functionality doesn't fundamentally change but makes requirements more explicit from function arguments: check_X, check_y, and check_sens (all boolean).

Issue ticket number and link

https://credo-ai.atlassian.net/browse/DSP-344

Known outstanding issues that are not fully accounted for

Checklist before requesting a review

Extra-mile Checklist

github-actions[bot] commented 1 year ago

Coverage

Coverage Report
FileStmtsMissCoverMissing
credoai
   __init__.py30100% 
credoai/artifacts
   __init__.py70100% 
credoai/artifacts/data
   __init__.py00100% 
   base_data.py1061388%53, 153, 156, 171, 178, 185, 189, 194, 197, 200, 212, 215, 222
   comparison_data.py631379%53, 60, 71, 76, 81, 90, 96, 100, 105, 114, 147, 153, 156
   tabular_data.py42686%52, 76, 80, 99, 101, 108
credoai/artifacts/model
   __init__.py00100% 
   base_model.py42295%57, 103
   classification_model.py481862%69–72, 88–120
   comparison_model.py110100% 
   constants_model.py50100% 
   regression_model.py11464%41–43, 46
credoai/evaluators
   __init__.py150100% 
   data_fairness.py1601392%85–92, 100, 225, 252, 282–294, 411, 446–447
   data_profiler.py61493%49, 73–74, 93
   deepchecks.py40392%113–122
   equity.py113695%73, 153–155, 226–227
   evaluator.py72692%67, 70, 89, 115, 180, 187
   fairness.py111298%111, 224
   feature_drift.py59198%66
   identity_verification.py112298%144–145
   model_profiler.py1033269%95–101, 117–130, 158–161, 174–179, 182–214, 256–257, 266–267, 305
   performance.py87792%108, 129–135
   privacy.py118497%410, 447–449
   ranking_fairness.py1121488%144–145, 165, 184, 190–191, 387–409, 414–444
   security.py97199%309
   shap.py871484%117, 125–126, 136–142, 168–169, 251–252, 282–290
   survival_fairness.py675025%27–31, 34–46, 51–62, 65–76, 79–97, 100, 103, 106
credoai/evaluators/utils
   __init__.py30100% 
   fairlearn.py18194%93
   utils.py8188%9
   validation.py872571%23, 43–44, 46–48, 55, 65, 67, 71–76, 89, 92, 95, 98–99, 116–123, 129–135, 138
credoai/governance
   __init__.py10100% 
credoai/lens
   __init__.py20100% 
   lens.py2061394%59, 201–202, 238–243, 300, 342, 366, 448, 463, 467, 479
   lens_validation.py703254%41, 45, 49–51, 63, 66, 71–75, 84, 89–92, 119, 122–140, 168–170
   pipeline_creator.py601280%20–21, 37, 79–91
   utils.py392828%20–27, 49–52, 71–82, 99, 106–109, 128–135
credoai/modules
   __init__.py30100% 
   constants_deepchecks.py20100% 
   constants_metrics.py190100% 
   constants_threshold_metrics.py30100% 
   metric_utils.py241825%15–30, 34–55
   metrics.py881385%63, 67, 70–71, 74, 84, 123, 135–140, 178, 185, 187
   metrics_credoai.py1674971%68–69, 73, 93–102, 107–109, 132–160, 176–179, 206, 230–231, 294–296, 372–378, 414–415, 485–486, 534, 638
   stats.py975048%15–18, 21–26, 29–31, 34–39, 42–56, 59–64, 106, 132–159, 191, 202–217
   stats_utils.py5340%5–8
credoai/prism
   __init__.py30100% 
   compare.py35294%71, 87
   prism.py36489%46, 48, 59, 86
   task.py17288%30, 37
credoai/prism/comparators
   __init_.py00100% 
   comparator.py17382%34, 42, 47
   metric_comparator.py44295%125, 131
credoai/utils
   __init__.py50100% 
   common.py1043368%55, 72–73, 79, 88–95, 106–107, 124–130, 135, 140–145, 156–163, 190
   constants.py20100% 
   dataset_utils.py613543%23, 26–31, 50, 54–55, 88–119
   logging.py551376%10–11, 14, 19–20, 23, 27, 44, 58–62
   model_utils.py734637%17–22, 32–33, 36–37, 42–47, 63–108, 114–121
   version_check.py11191%16
TOTAL311760181% 

esherman-credo commented 1 year ago

Linking the per-evaluator validation requirements. I'm shooting to have this implemented such that one can trivially read an evaluator's assumptions/requirements from the validate_artifacts function. E.g. in below screenshot from the updated Performance evaluator, we can see that the performance evaluator has the requirements:

  1. The metrics, X, and y supplied to the evaluator all need to be non-null (as objects)
  2. X and y also need to not contain object (i.e. the internals)

Moreover, we can see that sensitive features doesn't need to be checked for nullness, which is a hint that that sub-artifact isn't used in this evaluator (not otherwise obvious without looking carefully through the code or making an assumption based on the evalutor's docstring).

image