NSAPH-Projects / space

SpaCE, the Spatial Confounding Environment, loads benchmark datasets for causal inference methods tackling spatial confounding
https://nsaph-projects.github.io/space/
MIT License
12 stars 4 forks source link

Unable to download file from API #154

Open jckitch opened 1 month ago

jckitch commented 1 month ago

Thanks to you all for making such a fantastic package. I've run into a couple of issues as I've tried to install/use space on my own device, and I'll try to document them in the next few issues. I'm unable to download SpaceEnvs directly from the datasource (Harvard Dataverse) using the space package. My code is as follows.

>>> from spacebench import SpaceEnv
>>> env = SpaceEnv('healthd_dmgrcs_mortality_disc')
Downloaded: filename healthd_dmgrcs_mortality_disc.zip, id 7424743, saved to /var/folders/59/7fldcqwj7nbcgppt7ydjytth0000gp/T/healthd_dmgrcs_mortality_disc.zip
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/anaconda3/lib/python3.10/site-packages/spacebench/env.py", line 282, in __init__
    with zipfile.ZipFile(zip_path, "r") as zip_ref:
  File "/opt/homebrew/anaconda3/lib/python3.10/zipfile.py", line 1271, in __init__
    self._RealGetContents()
  File "/opt/homebrew/anaconda3/lib/python3.10/zipfile.py", line 1338, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

I then checked the tmp folder where the file was downloaded, (/var/folders/59/7fldcqwj7nbcgppt7ydjytth0000gp/T/healthd_dmgrcs_mortality_disc.zip), and found that the ".zip" file was just an error message of the following:

{"status":"ERROR",
 "code":404,"message":"API endpoint does not exist on this server. Please check your code for typos, or consult our API guide at http://guides.dataverse.org.",
 "requestUrl":"https://dataverse.harvard.edu/api/v1/access/datafile/:persistentId/?persistentId=7424743&User-Agent=pydataverse",
 "requestMethod":"GET"}

I checked for a few of the other datasets ("county_educatn_election_cont" and "healthd_pollutn_mortality_cont") and I get the same error. I am using the "latest development" version of space, downloaded with pip install "git+https://github.com/NSAPH-Projects/space@dev#egg=spacebench[all]".

jckitch commented 2 weeks ago

This issue should be fixed if requirements.txt is updated to "pyDataverse==0.3.1" instead of "pyDataverse>=0.3.1"