aiverify-foundation / aiverify

AI Verify
https://aiverify-foundation.github.io/aiverify/
Apache License 2.0
97 stars 27 forks source link

Guidance on Fairness and Robust Check in AIverify #226

Open YZx0pa opened 7 months ago

YZx0pa commented 7 months ago

Is there an existing issue for this?

Description

Currently, I've tested my own dataset (tabular datasets with xgboost version1.6.1 binary classification). SHAP Toolbox shows "Test completed", but both robust and fairness have "test error". And only see "Application error" in the report content when clicking "view report".

For the fairness check, I want to clarify that our model training does not involve a directly sensitive feature related to gender. However, I am concerned about the potential for unintended bias from indirect features influencing gender-related outcomes. I would appreciate guidance on how to conduct a fairness check in AIverify considering this scenario.

Regarding the robust check, I specified the "annotated ground truth path" as "/app/aiverify/uploads/data/test_with_groundtruth.sav" (the same dataset used for the SHAP Toolbox). I also set the "name of column contains image files" as "NA."

Thank you!

Current Behavior

Not able to generate the report

Expected Behavior

Able to complete test for robust and fairness toolbox and able to generate the summary report

Steps To Reproduce

NA

Environment

- Operating System and Version: Ubuntu22.04
- AI Verify Version: V0.9

Did you build using source code or from docker file?  
-Build from docker file

Screenshots/ Code snippets

Robust toolbox log:

2023-11-29 22:56:42,019 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:run_task_processing_in_process(66)]: The task validation is successful: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3 2023-11-29 22:56:42,020 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:run_task_processing_in_process(71)]: Working on task: message_id 1701269801989-0, message_args {"mode":"upload","id":"task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3","algorithmId":"algo:aiverify.stock.robustness_toolbox:robustness_toolbox","algorithmArgs":{"annotated_ground_truth_path":"/app/aiverify/uploads/data/test_with_groundtruth.sav","file_name_label":"NA"},"testDataset":"/app/aiverify/uploads/data/test_with_groundtruth.sav","modelFile":"/app/aiverify/uploads/model/xgb_model_check.pkl","groundTruthDataset":"/app/aiverify/uploads/data/test_with_groundtruth.sav","modelType":"classification","groundTruth":"shortlisted"}, task_type: TaskType.NEW 2023-11-29 22:56:42,020 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_status(146)]: The current task has changed status to TaskStatus.RUNNING 2023-11-29 22:56:42,020 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:process_new_task(110)]: Sending task update 2023-11-29 22:56:42,024 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:detect_pipeline(30)]: Attempting to detect pipeline model from the given path 2023-11-29 22:56:42,024 [INFO][log_utils.py::log_message(31)] [pipeline_manager.py:read_pipeline_path(55)]: Attempting to read pipeline: /app/aiverify/uploads/model/xgb_model_check.pkl 2023-11-29 22:56:42,024 [INFO][log_utils.py::log_message(31)] [pipeline_manager.py:read_pipeline_path(83)]: Pipeline validation successful 2023-11-29 22:56:42,025 [ERROR][log_utils.py::log_message(37)] [pipeline_manager.py:read_pipeline_path(101)]: There was an error getting pipeline files in the folder 2023-11-29 22:56:42,025 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:_load_instances(277)]: Unable to find pipeline model. Loading non-pipeline instances 2023-11-29 22:56:42,025 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(56)]: Attempting to read data: /app/aiverify/uploads/data/test_with_groundtruth.sav 2023-11-29 22:56:42,026 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(81)]: Data validation successful 2023-11-29 22:56:42,026 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(89)]: Attempting to deserialize data: /app/aiverify/uploads/data/test_with_groundtruth.sav 2023-11-29 22:56:42,027 [INFO][app_logger.py::add_to_log(116)] [task.py:_send_task_update(278)]: Task has received notification to send task update 2023-11-29 22:56:42,029 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(157)]: Consolidation results: {pandasdata.Plugin object at 0x7f707c349b50} {class 'pickleserializer.Plugin'} 2023-11-29 22:56:42,029 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(181)]: Supported data format: DataPluginType.PANDAS[SerializerPluginType.PICKLE] 2023-11-29 22:56:42,029 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_data(75)]: Data Instance: {pandasdata.Plugin object at 0x7f707c349b50}, Data Deserializer: SerializerPluginType.PICKLE 2023-11-29 22:56:42,029 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_update(614)]: The update sent successfully: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3:{'type': 'TaskResponse', 'status': 'Running', 'elapsedTime': 0, 'startTime': '2023-11-29T22:56:42.019462', 'output': '""', 'logFile': '/app/aiverify/test-engine-app/logs/task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3.log', 'taskProgress': 0} 2023-11-29 22:56:42,030 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(51)]: Attempting to read model: /app/aiverify/uploads/model/xgb_model_check.pkl 2023-11-29 22:56:42,030 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(78)]: Model validation successful 2023-11-29 22:56:42,030 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(84)]: Attempting to deserialize model: /app/aiverify/uploads/model/xgb_model_check.pkl 2023-11-29 22:56:42,134 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(107)]: Attempting to identify model format: {class 'xgboost.sklearn.XGBClassifier'} 2023-11-29 22:56:42,134 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(117)]: Supported model format: {class 'xgboost.sklearn.XGBClassifier'}, ModelPluginType.XGBOOST[xgboost.sklearn.XGBClassifier] 2023-11-29 22:56:42,134 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_model(201)]: Model Instance: {xgboostmodel.Plugin object at 0x7f6fd5dcd510}, Model Deserializer: SerializerPluginType.PICKLE 2023-11-29 22:56:42,135 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(56)]: Attempting to read data: /app/aiverify/uploads/data/test_with_groundtruth.sav 2023-11-29 22:56:42,135 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(81)]: Data validation successful 2023-11-29 22:56:42,135 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(89)]: Attempting to deserialize data: /app/aiverify/uploads/data/test_with_groundtruth.sav 2023-11-29 22:56:42,139 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(157)]: Consolidation results: {pandasdata.Plugin object at 0x7f6fd02a1650} {class 'pickleserializer.Plugin'} 2023-11-29 22:56:42,139 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(181)]: Supported data format: DataPluginType.PANDAS[SerializerPluginType.PICKLE] 2023-11-29 22:56:42,139 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_ground_truth(259)]: GroundTruth Instance: {pandasdata.Plugin object at 0x7f6fd02a1650}, GroundTruth Deserializer: SerializerPluginType.PICKLE 2023-11-29 22:56:42,159 [ERROR][log_utils.py::log_message(37)] [algorithm_manager.py:get_algorithm(126)]: There was an error getting algorithm instance (not found): algo:aiverify.stock.robustness_toolbox:robustness_toolbox 2023-11-29 22:56:42,160 [INFO][app_logger.py::add_to_log(116)] [plugin_controller.py:get_plugin_instance(112)]: Attempting to find algo:aiverify.stock.robustness_toolbox:robustness_toolbox in the algorithm registry 2023-11-29 22:56:42,163 [INFO][app_logger.py::add_to_log(116)] [plugin_controller.py:get_plugin_instance(141)]: algo:aiverify.stock.robustness_toolbox:robustness_toolbox is in the algorithm registry. Attempting to re-discover algorithm 2023-11-29 22:56:42,176 [INFO][robustness_toolbox.py::add_to_log(205)] Setup completed 2023-11-29 22:56:42,179 [INFO][log_utils.py::log_message(31)] [algorithm_manager.py:get_algorithm(116)]: Supported algorithm: algo:aiverify.stock.robustness_toolbox:robustness_toolbox, PluginType.ALGORITHM 2023-11-29 22:56:42,180 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_algorithm(356)]: Algorithm Instance: {robustness_toolbox.Plugin object at 0x7f6fd02aac10} 2023-11-29 22:56:42,205 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_progress(114)]: The current task completion progress is 100 2023-11-29 22:56:42,207 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:process_new_task(147)]: The raw task results: {'results': [0]} 2023-11-29 22:56:42,208 [INFO][app_logger.py::add_to_log(116)] [task.py:_send_task_update(278)]: Task has received notification to send task update 2023-11-29 22:56:42,210 [ERROR][app_logger.py::add_to_log(126)] [task_processing.py:process_new_task(167)]: Failed output schema validation: Task Results: {'results': [0]} 2023-11-29 22:56:42,210 [WARNING][app_logger.py::add_to_log(121)] [task_processing.py:process_new_task(189)]: The task terminated: The algorithm output schema validation failed 2023-11-29 22:56:42,210 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_update(614)]: The update sent successfully: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3:{'type': 'TaskResponse', 'status': 'Running', 'elapsedTime': 0, 'startTime': '2023-11-29T22:56:42.019462', 'output': '""', 'logFile': '/app/aiverify/test-engine-app/logs/task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3.log', 'taskProgress': 100} 2023-11-29 22:56:42,211 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_progress(114)]: The current task completion progress is 100 2023-11-29 22:56:42,211 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_status(146)]: The current task has changed status to TaskStatus.ERROR 2023-11-29 22:56:42,232 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_update(614)]: The update sent successfully: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3:{'type': 'TaskResponse', 'status': 'Error', 'elapsedTime': 0, 'startTime': '2023-11-29T22:56:42.019462', 'output': '', 'errorMessages': '[{"category": "SYSTEM_ERROR", "code": "CSYSx00146", "description": "Task Terminated: The algorithm output schema validation failed", "severity": "warning", "component": "task_processing.py"}]', 'logFile': '/app/aiverify/test-engine-app/logs/task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3.log', 'taskProgress': 100} 2023-11-29 22:56:42,232 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_acknowledgement(659)]: The acknowledgement sent successfully - 1701269801989-0 2023-11-29 22:56:42,232 [INFO][app_logger.py::add_to_log(116)] [task.py:cleanup(104)]: The system has received notification to clean up task

Additional Context

NA

Possible Solution (Optional)

No response

YZx0pa commented 7 months ago

Hi @imda-benedictlee, Good day, could I know why I couldn't complete the robust check properly? Is my setting for robust check correct? Thank you!

imda-benedictlee commented 7 months ago

Hi @YZx0pa, thank you for raising this issue and my apologies for the delay in response. Rest assure that we have been looking into this issue. We will get back to you with an answer soon.

imda-benedictlee commented 6 months ago

Hi @YZx0pa, I have spoken to the developer. They have requested for the test-engine-app docker container logs. To get the logs, start by doing the following:

  1. Find the docker container id for test-engine-app. You can do this in the terminal: docker ps
  2. Once you get the docker container id, type the following in the terminal (replace \ with container id): docker logs --follow <container ID>
  3. Now re-run the test that you did previously
  4. Once you are done re-running the test, copy the logs that you just followed when running the docker logs --follow <container ID> command.
  5. Paste the logs here.
kimeetok commented 6 months ago

Hi @YZx0pa my apologies for the delayed response.

Our Fairness Metrics Toolbox on AI Verify currently requires the use case to have an identified sensitive feature(s) in order to identify the most relevant fairness metric and generate the confusion matrix (TP, FP, TN, FN).

Since your model training does not directly involve a sensitive feature, you could try the following method to identify any unintended bias from indirect features by comparing fairness results of two models.

  1. In your current AI Verify project, select the suspected indirect feature(s) as the 'Sensitive Feature Name' and run the test on your model.

  2. Train a second model by including the sensitive feature (e.g. gender) together with the suspected indirect feature(s) in the training dataset. Upload this biased model onto AI Verify.

  3. Duplicate the AI Verify project in step 1. In this new project, test the second model instead and select only the sensitive feature (e.g. gender) as the 'Sensitive Feature Name'.

Analyse the fairness results of the two models from the two reports generated. If the two models have similar fairness results, removing the sensitive attribute from the training data did not affect the fairness of the model. Hence, there might be a direct correlation between the sensitive feature and the suspected indirect feature.

To validate this outside of AI Verify, you can further run an equality inference test for each suspected indirect feature on the first model. i.e. Several testing points where all features except for the suspected indirect feature are assigned the same value. If the prediction for these points vary, there is possibility of a bias leakage here.

YZx0pa commented 6 months ago

Hi @YZx0pa, I have spoken to the developer. They have requested for the test-engine-app docker container logs. To get the logs, start by doing the following:

  1. Find the docker container id for test-engine-app. You can do this in the terminal: docker ps
  2. Once you get the docker container id, type the following in the terminal (replace with container id): docker logs --follow <container ID>
  3. Now re-run the test that you did previously
  4. Once you are done re-running the test, copy the logs that you just followed when running the docker logs --follow <container ID> command.
  5. Paste the logs here.

Hi @imda-benedictlee , thanks for your advice and sorry for the late reply! I faced the below error when \using docker logs --follow <container ID>, and still couldn't solve it yet. "2023-12-18 11:46:06,746 [DEBUG][app_logger.py::add_to_log(111)] [worker.py:setup(70)]: Environment Variables: error from daemon in stream: Error grabbing logs: invalid character '\x00' looking for beginning of value"

YZx0pa commented 6 months ago

Hi @YZx0pa my apologies for the delayed response.

Our Fairness Metrics Toolbox on AI Verify currently requires the use case to have an identified sensitive feature(s) in order to identify the most relevant fairness metric and generate the confusion matrix (TP, FP, TN, FN).

Since your model training does not directly involve a sensitive feature, you could try the following method to identify any unintended bias from indirect features by comparing fairness results of two models.

  1. In your current AI Verify project, select the suspected indirect feature(s) as the 'Sensitive Feature Name' and run the test on your model.
  2. Train a second model by including the sensitive feature (e.g. gender) together with the suspected indirect feature(s) in the training dataset. Upload this biased model onto AI Verify.
  3. Duplicate the AI Verify project in step 1. In this new project, test the second model instead and select only the sensitive feature (e.g. gender) as the 'Sensitive Feature Name'.

Analyse the fairness results of the two models from the two reports generated. If the two models have similar fairness results, removing the sensitive attribute from the training data did not affect the fairness of the model. Hence, there might be a direct correlation between the sensitive feature and the suspected indirect feature.

To validate this outside of AI Verify, you can further run an equality inference test for each suspected indirect feature on the first model. i.e. Several testing points where all features except for the suspected indirect feature are assigned the same value. If the prediction for these points vary, there is possibility of a bias leakage here.

Hi @kimeetok , thank you for your guidance, and I apologize for the delayed response. While the method you recommended seems like a valid approach for fairness verification, our client specifically requests a directly generated AI Verify report to demonstrate the fairness of our current model with respect to gender. I'm exploring whether there is a simpler way for us to generate this fairness report. For instance, is it possible to allow us to proceed with the fairness check based on the model's prediction results (with or without sensitive features), along with the corresponding sensitive feature information? Additionally, are there any concerns about directly assessing fairness based on the model's prediction results and sensitive features?

imda-benedictlee commented 6 months ago

Hi @YZx0pa, I have spoken to the developer. They have requested for the test-engine-app docker container logs. To get the logs, start by doing the following:

  1. Find the docker container id for test-engine-app. You can do this in the terminal: docker ps
  2. Once you get the docker container id, type the following in the terminal (replace with container id): docker logs --follow <container ID>
  3. Now re-run the test that you did previously
  4. Once you are done re-running the test, copy the logs that you just followed when running the docker logs --follow <container ID> command.
  5. Paste the logs here.

Hi @imda-benedictlee , thanks for your advice and sorry for the late reply! I faced the below error when \using docker logs --follow <container ID>, and still couldn't solve it yet. "2023-12-18 11:46:06,746 [DEBUG][app_logger.py::add_to_log(111)] [worker.py:setup(70)]: Environment Variables: error from daemon in stream: Error grabbing logs: invalid character '\x00' looking for beginning of value"

Hi @YZx0pa, I did some research on the issue that you faced. This issue could be due to some corruption of the Docker logs. You can take a look at the issue and solution at this link: https://copyprogramming.com/howto/docker-error-grabbing-logs-invalid-character-x00-looking-for-beginning-of-value. However, as I have not encounter this issue before, I cannot guarantee that the solution in the link given will work for you. Nevertheless, if you are able to delete the container and recreate them again, I would suggest going through that route instead and try replicate the issue, then use the Docker Follow commands stated previously to track the issue. Do reach out to me if you require additional help.

YZx0pa commented 6 months ago

Hi @imda-benedictlee , thanks a lot for your patient guidance! For my case, I solved the issue by using commands "docker compose down" and "docker compose up", hopefully it's the log files required. Please see the file attached for the copy out logs, please let me know if there is still some problem or more info need to provide. Thanks! logsfile.txt

imda-kelvinkok commented 6 months ago

hello @YZx0pa, thanks for providing us the log file. It seems like the sensitive_feature "gender_id" is not in the dataset. "gender_id" should be a column in the dataset that you are using. Perhaps is it possible to send a copy of the dataset that you are testing to us so that we can test it on our end? If you are sending us your dataset, remember to remove any sensitive information. Thanks!

YZx0pa commented 5 months ago

Hi @kimeetok, thanks for helping on my case, could I have your email address, so I send you the sample data by email, and do I need to provide the model also? Thanks!

kimeetok commented 5 months ago

Hi @YZx0pa , yes we will need the model too. Please send them to tok_kim_ee@imda.gov.sg and kelvin_kok@imda.gov.sg

Regarding having a more direct way to check fairness with regards to indirect sensitive features, you can explore creating a new test that will take in additional information needed (another model/ dataset file, more test arguments etc) to do the required calculations. Our developer tools can help you create this new test plugin: https://github.com/IMDA-BTG/aiverify-developer-tools

imda-kelvinkok commented 5 months ago

Hello @YZx0pa, I have tried running your model and dataset and it seems like there are two issues:

YZx0pa commented 5 months ago

Hi @imda-kelvinkok , Thanks for your help on my case! Please see my reply below:

  1. Regarding to gender_id, as it is not our model training feature, so I excluded it out from the Xtest_sample.sav

  2. I've sent you the model trained with version 1.7.4 (xgb_model_check_1_7_4.sav) by email, but I am not sure it can solve the problem or not. Because I observed the same error when I run below code, could you please help to verify? And could I also confirm the error you mentioned occurred when executing code booster.set_feature_types({'feature_name': 'feature_type'})? Thanks!

""" model = pickle.load(open('xgb_model_check_1_7_4.sav', 'rb')) booster = model.get_booster() booster.set_feature_types({'feature_name': 'feature_type'}) """

imda-kelvinkok commented 5 months ago

Hello @YZx0pa,

Thanks for retraining the model. It seems there are a few issues while running the AI Verify report so I ran the algorithms separately using your configuration to test fairness, explainability and robustness (with both your old and new XGBoost models) and here's what I've done and observed:

Fairness Metrics Toolbox for Classification

Robustness

SHAP

We can try to your old model (1.6.1) but you will have to downgrade the version of your XGBoost Python library to that version as well. If we use 1.7.4, I am not sure if there will be other compatibility issues so the most straightforward way is to downgrade the Python library version.

Can you provide me the test dataset with the sensitive feature 'gender_id' in it? I think that should solve the problem.

YZx0pa commented 5 months ago

Hi @imda-kelvinkok I've sent you the updated model and data to your email as per your request last Friday, please let me know if there is more info required. Thanks!

imda-kelvinkok commented 5 months ago

hello @YZx0pa, I've sent you an email yesterday not sure if you have received it. Let us know if you have further enquiries. Thanks!

YZx0pa commented 5 months ago

Hi @imda-kelvinkok Good morning! Thanks for advising, I've received your email, and I downloaded the files and used them to generate the AI verify report, it seems the same issue to me, only pass the shap check, and both robust and fairness checks failed.

YZx0pa commented 5 months ago

hi @imda-kelvinkok I appreciate your expert assistance with the Aiverify APP Docker setup and source code installation. Thanks to your guidance, we can now generate AI verify reports using the trained model with gender_id.

YZx0pa commented 5 months ago

Hi @YZx0pa , yes we will need the model too. Please send them to tok_kim_ee@imda.gov.sg and kelvin_kok@imda.gov.sg

Regarding having a more direct way to check fairness with regards to indirect sensitive features, you can explore creating a new test that will take in additional information needed (another model/ dataset file, more test arguments etc) to do the required calculations. Our developer tools can help you create this new test plugin: https://github.com/IMDA-BTG/aiverify-developer-tools

Hello @kimeetok , appreciate your advice. We will continue to explore the issue as per your advice. In our specific situation, the trained model on our production server currently does not include the gender_id feature. Given the potential impact of both direct and indirect features on gender bias, and the significant concern about gender bias in our system from both our team and customers, we aim to assess the production model and leverage Aiverify for report generation. We would greatly appreciate any assistance from IMDA on this matter!

YZx0pa commented 5 months ago

Hi @imda-kelvinkok , could I check after I uploaded the dataset, and it turned out to be failed, is there a way to remove it from my datasets records? Thanks!

image

imda-kelvinkok commented 5 months ago

hi @YZx0pa, yes from the main page, go to

  1. Models & Data->Datasets
  2. Select the checkbox of the data you want to delete
  3. Click on the trash bin icon at the top right when you're done
kimeetok commented 5 months ago

Hi @YZx0pa I have raised a new feature request at #254 to track the suggested fix we discussed on Monday.

YZx0pa commented 5 months ago

hi @YZx0pa, yes from the main page, go to

  1. Models & Data->Datasets
  2. Select the checkbox of the data you want to delete
  3. Click on the trash bin icon at the top right when you're done

Hi @imda-kelvinkok Got it, great thanks!

YZx0pa commented 5 months ago

Hi @YZx0pa I have raised a new feature request at #254 to track the suggested fix we discussed on Monday.

Hi @kimeetok Well noted with thanks! 👍