Trusted-AI / AIF360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
https://aif360.res.ibm.com/
Apache License 2.0
2.46k stars 840 forks source link

issue with fetch_lawschool_gpa #499

Open jr2021 opened 1 year ago

jr2021 commented 1 year ago

Hello,

I'm having a problem with downloading the lawschool GPA dataset using the fetch_lawschool_gpa function due to an error thrown in tempeh's seaphe_datasets.py file.

  File "/home/robertsj/miniconda3/envs/hpobench_fairmohpo/lib/python3.9/site-packages/tempeh/datasets/seaphe_datasets.py", line 39, in load_lawschool_data
with zipfile.ZipFile(temp_file_name, 'r') as zip_ref:
  File "/home/robertsj/miniconda3/envs/hpobench_fairmohpo/lib/python3.9/zipfile.py", line 1266, in __init__
    self._RealGetContents()
  File "/home/robertsj/miniconda3/envs/hpobench_fairmohpo/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

I believe the issue is that response.content contains the HTML code for the URL http://www.seaphe.org/databases/LSAC/LSAC_SAS.zip, not the zip contents. Do you know if tempeh has made any major changes or dropped support for the Lawschool dataset?

I wonder if you are able to reproduce this error, or provide me with other direction?

Best,

Jake Robertson University of Freiburg

kvarsh commented 1 year ago

Also see https://github.com/Trusted-AI/AIF360/pull/492