Open YZx0pa opened 12 months ago
Hi @imda-benedictlee, Good day, could I know why I couldn't complete the robust check properly? Is my setting for robust check correct? Thank you!
Hi @YZx0pa, thank you for raising this issue and my apologies for the delay in response. Rest assure that we have been looking into this issue. We will get back to you with an answer soon.
Hi @YZx0pa, I have spoken to the developer. They have requested for the test-engine-app docker container logs. To get the logs, start by doing the following:
docker ps
docker logs --follow <container ID>
docker logs --follow <container ID>
command.Hi @YZx0pa my apologies for the delayed response.
Our Fairness Metrics Toolbox on AI Verify currently requires the use case to have an identified sensitive feature(s) in order to identify the most relevant fairness metric and generate the confusion matrix (TP, FP, TN, FN).
Since your model training does not directly involve a sensitive feature, you could try the following method to identify any unintended bias from indirect features by comparing fairness results of two models.
In your current AI Verify project, select the suspected indirect feature(s) as the 'Sensitive Feature Name' and run the test on your model.
Train a second model by including the sensitive feature (e.g. gender) together with the suspected indirect feature(s) in the training dataset. Upload this biased model onto AI Verify.
Duplicate the AI Verify project in step 1. In this new project, test the second model instead and select only the sensitive feature (e.g. gender) as the 'Sensitive Feature Name'.
Analyse the fairness results of the two models from the two reports generated. If the two models have similar fairness results, removing the sensitive attribute from the training data did not affect the fairness of the model. Hence, there might be a direct correlation between the sensitive feature and the suspected indirect feature.
To validate this outside of AI Verify, you can further run an equality inference test for each suspected indirect feature on the first model. i.e. Several testing points where all features except for the suspected indirect feature are assigned the same value. If the prediction for these points vary, there is possibility of a bias leakage here.
Hi @YZx0pa, I have spoken to the developer. They have requested for the test-engine-app docker container logs. To get the logs, start by doing the following:
- Find the docker container id for test-engine-app. You can do this in the terminal:
docker ps
- Once you get the docker container id, type the following in the terminal (replace
with container id): docker logs --follow <container ID>
- Now re-run the test that you did previously
- Once you are done re-running the test, copy the logs that you just followed when running the
docker logs --follow <container ID>
command.- Paste the logs here.
Hi @imda-benedictlee , thanks for your advice and sorry for the late reply! I faced the below error when \using docker logs --follow <container ID>
, and still couldn't solve it yet.
"2023-12-18 11:46:06,746 [DEBUG][app_logger.py::add_to_log(111)] [worker.py:setup(70)]:
Environment Variables:
error from daemon in stream: Error grabbing logs: invalid character '\x00' looking for beginning of value"
Hi @YZx0pa my apologies for the delayed response.
Our Fairness Metrics Toolbox on AI Verify currently requires the use case to have an identified sensitive feature(s) in order to identify the most relevant fairness metric and generate the confusion matrix (TP, FP, TN, FN).
Since your model training does not directly involve a sensitive feature, you could try the following method to identify any unintended bias from indirect features by comparing fairness results of two models.
- In your current AI Verify project, select the suspected indirect feature(s) as the 'Sensitive Feature Name' and run the test on your model.
- Train a second model by including the sensitive feature (e.g. gender) together with the suspected indirect feature(s) in the training dataset. Upload this biased model onto AI Verify.
- Duplicate the AI Verify project in step 1. In this new project, test the second model instead and select only the sensitive feature (e.g. gender) as the 'Sensitive Feature Name'.
Analyse the fairness results of the two models from the two reports generated. If the two models have similar fairness results, removing the sensitive attribute from the training data did not affect the fairness of the model. Hence, there might be a direct correlation between the sensitive feature and the suspected indirect feature.
To validate this outside of AI Verify, you can further run an equality inference test for each suspected indirect feature on the first model. i.e. Several testing points where all features except for the suspected indirect feature are assigned the same value. If the prediction for these points vary, there is possibility of a bias leakage here.
Hi @kimeetok , thank you for your guidance, and I apologize for the delayed response. While the method you recommended seems like a valid approach for fairness verification, our client specifically requests a directly generated AI Verify report to demonstrate the fairness of our current model with respect to gender. I'm exploring whether there is a simpler way for us to generate this fairness report. For instance, is it possible to allow us to proceed with the fairness check based on the model's prediction results (with or without sensitive features), along with the corresponding sensitive feature information? Additionally, are there any concerns about directly assessing fairness based on the model's prediction results and sensitive features?
Hi @YZx0pa, I have spoken to the developer. They have requested for the test-engine-app docker container logs. To get the logs, start by doing the following:
- Find the docker container id for test-engine-app. You can do this in the terminal:
docker ps
- Once you get the docker container id, type the following in the terminal (replace with container id):
docker logs --follow <container ID>
- Now re-run the test that you did previously
- Once you are done re-running the test, copy the logs that you just followed when running the
docker logs --follow <container ID>
command.- Paste the logs here.
Hi @imda-benedictlee , thanks for your advice and sorry for the late reply! I faced the below error when
\using docker logs --follow <container ID>
, and still couldn't solve it yet. "2023-12-18 11:46:06,746 [DEBUG][app_logger.py::add_to_log(111)] [worker.py:setup(70)]: Environment Variables: error from daemon in stream: Error grabbing logs: invalid character '\x00' looking for beginning of value"
Hi @YZx0pa, I did some research on the issue that you faced. This issue could be due to some corruption of the Docker logs. You can take a look at the issue and solution at this link: https://copyprogramming.com/howto/docker-error-grabbing-logs-invalid-character-x00-looking-for-beginning-of-value. However, as I have not encounter this issue before, I cannot guarantee that the solution in the link given will work for you. Nevertheless, if you are able to delete the container and recreate them again, I would suggest going through that route instead and try replicate the issue, then use the Docker Follow commands stated previously to track the issue. Do reach out to me if you require additional help.
Hi @imda-benedictlee , thanks a lot for your patient guidance! For my case, I solved the issue by using commands "docker compose down" and "docker compose up", hopefully it's the log files required. Please see the file attached for the copy out logs, please let me know if there is still some problem or more info need to provide. Thanks! logsfile.txt
hello @YZx0pa, thanks for providing us the log file. It seems like the sensitive_feature "gender_id" is not in the dataset. "gender_id" should be a column in the dataset that you are using. Perhaps is it possible to send a copy of the dataset that you are testing to us so that we can test it on our end? If you are sending us your dataset, remember to remove any sensitive information. Thanks!
Hi @kimeetok, thanks for helping on my case, could I have your email address, so I send you the sample data by email, and do I need to provide the model also? Thanks!
Hi @YZx0pa , yes we will need the model too. Please send them to tok_kim_ee@imda.gov.sg and kelvin_kok@imda.gov.sg
Regarding having a more direct way to check fairness with regards to indirect sensitive features, you can explore creating a new test that will take in additional information needed (another model/ dataset file, more test arguments etc) to do the required calculations. Our developer tools can help you create this new test plugin: https://github.com/IMDA-BTG/aiverify-developer-tools
Hello @YZx0pa, I have tried running your model and dataset and it seems like there are two issues:
AttributeError: 'XGBModel' object has no attribute 'feature_types'
suggests that the different versions have a differing attribute. Is it possible to retrain the model in 1.7.4? You can also take reference hereHi @imda-kelvinkok , Thanks for your help on my case! Please see my reply below:
Regarding to gender_id, as it is not our model training feature, so I excluded it out from the Xtest_sample.sav
I've sent you the model trained with version 1.7.4 (xgb_model_check_1_7_4.sav) by email, but I am not sure it can solve the problem or not. Because I observed the same error when I run below code, could you please help to verify? And could I also confirm the error you mentioned occurred when executing code booster.set_feature_types({'feature_name': 'feature_type'})? Thanks!
""" model = pickle.load(open('xgb_model_check_1_7_4.sav', 'rb')) booster = model.get_booster() booster.set_feature_types({'feature_name': 'feature_type'}) """
Hello @YZx0pa,
Thanks for retraining the model. It seems there are a few issues while running the AI Verify report so I ran the algorithms separately using your configuration to test fairness, explainability and robustness (with both your old and new XGBoost models) and here's what I've done and observed:
Fairness Metrics Toolbox for Classification
'gender_id'
is not in the test dataset. The sensitive feature should be present in the dataset. Is it possible to train with the dataset with 'gender_id'
in it?Robustness
'XGBClassifier' object has no attribute 'use_label_encoder'
.SHAP
We can try to your old model (1.6.1) but you will have to downgrade the version of your XGBoost Python library to that version as well. If we use 1.7.4, I am not sure if there will be other compatibility issues so the most straightforward way is to downgrade the Python library version.
Can you provide me the test dataset with the sensitive feature 'gender_id'
in it? I think that should solve the problem.
Hi @imda-kelvinkok I've sent you the updated model and data to your email as per your request last Friday, please let me know if there is more info required. Thanks!
hello @YZx0pa, I've sent you an email yesterday not sure if you have received it. Let us know if you have further enquiries. Thanks!
Hi @imda-kelvinkok Good morning! Thanks for advising, I've received your email, and I downloaded the files and used them to generate the AI verify report, it seems the same issue to me, only pass the shap check, and both robust and fairness checks failed.
hi @imda-kelvinkok I appreciate your expert assistance with the Aiverify APP Docker setup and source code installation. Thanks to your guidance, we can now generate AI verify reports using the trained model with gender_id.
Hi @YZx0pa , yes we will need the model too. Please send them to tok_kim_ee@imda.gov.sg and kelvin_kok@imda.gov.sg
Regarding having a more direct way to check fairness with regards to indirect sensitive features, you can explore creating a new test that will take in additional information needed (another model/ dataset file, more test arguments etc) to do the required calculations. Our developer tools can help you create this new test plugin: https://github.com/IMDA-BTG/aiverify-developer-tools
Hello @kimeetok , appreciate your advice. We will continue to explore the issue as per your advice. In our specific situation, the trained model on our production server currently does not include the gender_id feature. Given the potential impact of both direct and indirect features on gender bias, and the significant concern about gender bias in our system from both our team and customers, we aim to assess the production model and leverage Aiverify for report generation. We would greatly appreciate any assistance from IMDA on this matter!
Hi @imda-kelvinkok , could I check after I uploaded the dataset, and it turned out to be failed, is there a way to remove it from my datasets records? Thanks!
hi @YZx0pa, yes from the main page, go to
Hi @YZx0pa I have raised a new feature request at #254 to track the suggested fix we discussed on Monday.
hi @YZx0pa, yes from the main page, go to
- Models & Data->Datasets
- Select the checkbox of the data you want to delete
- Click on the trash bin icon at the top right when you're done
Hi @imda-kelvinkok Got it, great thanks!
Hi @YZx0pa I have raised a new feature request at #254 to track the suggested fix we discussed on Monday.
Hi @kimeetok Well noted with thanks! 👍
Is there an existing issue for this?
Description
Currently, I've tested my own dataset (tabular datasets with xgboost version1.6.1 binary classification). SHAP Toolbox shows "Test completed", but both robust and fairness have "test error". And only see "Application error" in the report content when clicking "view report".
For the fairness check, I want to clarify that our model training does not involve a directly sensitive feature related to gender. However, I am concerned about the potential for unintended bias from indirect features influencing gender-related outcomes. I would appreciate guidance on how to conduct a fairness check in AIverify considering this scenario.
Regarding the robust check, I specified the "annotated ground truth path" as "/app/aiverify/uploads/data/test_with_groundtruth.sav" (the same dataset used for the SHAP Toolbox). I also set the "name of column contains image files" as "NA."
Thank you!
Current Behavior
Not able to generate the report
Expected Behavior
Able to complete test for robust and fairness toolbox and able to generate the summary report
Steps To Reproduce
NA
Environment
Screenshots/ Code snippets
Robust toolbox log:
2023-11-29 22:56:42,019 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:run_task_processing_in_process(66)]: The task validation is successful: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3 2023-11-29 22:56:42,020 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:run_task_processing_in_process(71)]: Working on task: message_id 1701269801989-0, message_args {"mode":"upload","id":"task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3","algorithmId":"algo:aiverify.stock.robustness_toolbox:robustness_toolbox","algorithmArgs":{"annotated_ground_truth_path":"/app/aiverify/uploads/data/test_with_groundtruth.sav","file_name_label":"NA"},"testDataset":"/app/aiverify/uploads/data/test_with_groundtruth.sav","modelFile":"/app/aiverify/uploads/model/xgb_model_check.pkl","groundTruthDataset":"/app/aiverify/uploads/data/test_with_groundtruth.sav","modelType":"classification","groundTruth":"shortlisted"}, task_type: TaskType.NEW 2023-11-29 22:56:42,020 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_status(146)]: The current task has changed status to TaskStatus.RUNNING 2023-11-29 22:56:42,020 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:process_new_task(110)]: Sending task update 2023-11-29 22:56:42,024 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:detect_pipeline(30)]: Attempting to detect pipeline model from the given path 2023-11-29 22:56:42,024 [INFO][log_utils.py::log_message(31)] [pipeline_manager.py:read_pipeline_path(55)]: Attempting to read pipeline: /app/aiverify/uploads/model/xgb_model_check.pkl 2023-11-29 22:56:42,024 [INFO][log_utils.py::log_message(31)] [pipeline_manager.py:read_pipeline_path(83)]: Pipeline validation successful 2023-11-29 22:56:42,025 [ERROR][log_utils.py::log_message(37)] [pipeline_manager.py:read_pipeline_path(101)]: There was an error getting pipeline files in the folder 2023-11-29 22:56:42,025 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:_load_instances(277)]: Unable to find pipeline model. Loading non-pipeline instances 2023-11-29 22:56:42,025 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(56)]: Attempting to read data: /app/aiverify/uploads/data/test_with_groundtruth.sav 2023-11-29 22:56:42,026 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(81)]: Data validation successful 2023-11-29 22:56:42,026 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(89)]: Attempting to deserialize data: /app/aiverify/uploads/data/test_with_groundtruth.sav 2023-11-29 22:56:42,027 [INFO][app_logger.py::add_to_log(116)] [task.py:_send_task_update(278)]: Task has received notification to send task update 2023-11-29 22:56:42,029 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(157)]: Consolidation results: {pandasdata.Plugin object at 0x7f707c349b50} {class 'pickleserializer.Plugin'} 2023-11-29 22:56:42,029 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(181)]: Supported data format: DataPluginType.PANDAS[SerializerPluginType.PICKLE] 2023-11-29 22:56:42,029 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_data(75)]: Data Instance: {pandasdata.Plugin object at 0x7f707c349b50}, Data Deserializer: SerializerPluginType.PICKLE 2023-11-29 22:56:42,029 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_update(614)]: The update sent successfully: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3:{'type': 'TaskResponse', 'status': 'Running', 'elapsedTime': 0, 'startTime': '2023-11-29T22:56:42.019462', 'output': '""', 'logFile': '/app/aiverify/test-engine-app/logs/task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3.log', 'taskProgress': 0} 2023-11-29 22:56:42,030 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(51)]: Attempting to read model: /app/aiverify/uploads/model/xgb_model_check.pkl 2023-11-29 22:56:42,030 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(78)]: Model validation successful 2023-11-29 22:56:42,030 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(84)]: Attempting to deserialize model: /app/aiverify/uploads/model/xgb_model_check.pkl 2023-11-29 22:56:42,134 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(107)]: Attempting to identify model format: {class 'xgboost.sklearn.XGBClassifier'} 2023-11-29 22:56:42,134 [INFO][log_utils.py::log_message(31)] [model_manager.py:read_model_file(117)]: Supported model format: {class 'xgboost.sklearn.XGBClassifier'}, ModelPluginType.XGBOOST[xgboost.sklearn.XGBClassifier] 2023-11-29 22:56:42,134 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_model(201)]: Model Instance: {xgboostmodel.Plugin object at 0x7f6fd5dcd510}, Model Deserializer: SerializerPluginType.PICKLE 2023-11-29 22:56:42,135 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(56)]: Attempting to read data: /app/aiverify/uploads/data/test_with_groundtruth.sav 2023-11-29 22:56:42,135 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(81)]: Data validation successful 2023-11-29 22:56:42,135 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(89)]: Attempting to deserialize data: /app/aiverify/uploads/data/test_with_groundtruth.sav 2023-11-29 22:56:42,139 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(157)]: Consolidation results: {pandasdata.Plugin object at 0x7f6fd02a1650} {class 'pickleserializer.Plugin'} 2023-11-29 22:56:42,139 [INFO][log_utils.py::log_message(31)] [data_manager.py:read_data(181)]: Supported data format: DataPluginType.PANDAS[SerializerPluginType.PICKLE] 2023-11-29 22:56:42,139 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_ground_truth(259)]: GroundTruth Instance: {pandasdata.Plugin object at 0x7f6fd02a1650}, GroundTruth Deserializer: SerializerPluginType.PICKLE 2023-11-29 22:56:42,159 [ERROR][log_utils.py::log_message(37)] [algorithm_manager.py:get_algorithm(126)]: There was an error getting algorithm instance (not found): algo:aiverify.stock.robustness_toolbox:robustness_toolbox 2023-11-29 22:56:42,160 [INFO][app_logger.py::add_to_log(116)] [plugin_controller.py:get_plugin_instance(112)]: Attempting to find algo:aiverify.stock.robustness_toolbox:robustness_toolbox in the algorithm registry 2023-11-29 22:56:42,163 [INFO][app_logger.py::add_to_log(116)] [plugin_controller.py:get_plugin_instance(141)]: algo:aiverify.stock.robustness_toolbox:robustness_toolbox is in the algorithm registry. Attempting to re-discover algorithm 2023-11-29 22:56:42,176 [INFO][robustness_toolbox.py::add_to_log(205)] Setup completed 2023-11-29 22:56:42,179 [INFO][log_utils.py::log_message(31)] [algorithm_manager.py:get_algorithm(116)]: Supported algorithm: algo:aiverify.stock.robustness_toolbox:robustness_toolbox, PluginType.ALGORITHM 2023-11-29 22:56:42,180 [INFO][app_logger.py::add_to_log(116)] [stream_processing.py:load_algorithm(356)]: Algorithm Instance: {robustness_toolbox.Plugin object at 0x7f6fd02aac10} 2023-11-29 22:56:42,205 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_progress(114)]: The current task completion progress is 100 2023-11-29 22:56:42,207 [INFO][app_logger.py::add_to_log(116)] [task_processing.py:process_new_task(147)]: The raw task results: {'results': [0]} 2023-11-29 22:56:42,208 [INFO][app_logger.py::add_to_log(116)] [task.py:_send_task_update(278)]: Task has received notification to send task update 2023-11-29 22:56:42,210 [ERROR][app_logger.py::add_to_log(126)] [task_processing.py:process_new_task(167)]: Failed output schema validation: Task Results: {'results': [0]} 2023-11-29 22:56:42,210 [WARNING][app_logger.py::add_to_log(121)] [task_processing.py:process_new_task(189)]: The task terminated: The algorithm output schema validation failed 2023-11-29 22:56:42,210 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_update(614)]: The update sent successfully: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3:{'type': 'TaskResponse', 'status': 'Running', 'elapsedTime': 0, 'startTime': '2023-11-29T22:56:42.019462', 'output': '""', 'logFile': '/app/aiverify/test-engine-app/logs/task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3.log', 'taskProgress': 100} 2023-11-29 22:56:42,211 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_progress(114)]: The current task completion progress is 100 2023-11-29 22:56:42,211 [INFO][app_logger.py::add_to_log(116)] [task_result.py:set_status(146)]: The current task has changed status to TaskStatus.ERROR 2023-11-29 22:56:42,232 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_update(614)]: The update sent successfully: task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3:{'type': 'TaskResponse', 'status': 'Error', 'elapsedTime': 0, 'startTime': '2023-11-29T22:56:42.019462', 'output': '', 'errorMessages': '[{"category": "SYSTEM_ERROR", "code": "CSYSx00146", "description": "Task Terminated: The algorithm output schema validation failed", "severity": "warning", "component": "task_processing.py"}]', 'logFile': '/app/aiverify/test-engine-app/logs/task:6566ef61e101f3b948a06ce6-65675129e101f3b948a100f3.log', 'taskProgress': 100} 2023-11-29 22:56:42,232 [INFO][app_logger.py::add_to_log(116)] [worker.py:_send_acknowledgement(659)]: The acknowledgement sent successfully - 1701269801989-0 2023-11-29 22:56:42,232 [INFO][app_logger.py::add_to_log(116)] [task.py:cleanup(104)]: The system has received notification to clean up task
Additional Context
NA
Possible Solution (Optional)
No response