USEPA / standardizedinventories

Standardized Release and Waste Inventories
MIT License
25 stars 16 forks source link

Unresolvable Data Management - HTTPError #144

Closed dt-woods closed 9 months ago

dt-woods commented 9 months ago

Here's my minimal working example:

>>> from stew import getInventory
>>> getInventory('eGRID', 2016)
INFO eGRID_2016 not found in ~/Library/Application Support/stewi/flowbyfacility
INFO requested inventory does not exist in local directory, it will be generated...
INFO downloading eGRID data for 2016
ERROR 403 Client Error: Forbidden for url: https://www.epa.gov/sites/production/files/2020-01/egrid2018_historical_files_since_1996.zip
Traceback (most recent call last):
  File "~/Envs/ebm/lib/python3.11/site-packages/esupy/remote.py", line 36, in make_url_request
    response.raise_for_status()
  File "~/Envs/ebm/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://www.epa.gov/sites/production/files/2020-01/egrid2018_historical_files_since_1996.zip
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 getInventory('eGRID', 2016)

File ~/Envs/ebm/lib/python3.11/site-packages/stewi/__init__.py:82, in getInventory(inventory_acronym, year, stewiformat, filters, filter_for_LCI, US_States_Only, download_if_missing, keep_sec_cntx)
     66 """Return or generate an inventory in a standard output format.
     67 
     68 :param inventory_acronym: like 'TRI'
   (...)
     79 :return: dataframe with standard fields depending on output format
     80 """
     81 f = ensure_format(stewiformat)
---> 82 inventory = read_inventory(inventory_acronym, year, f,
     83                            download_if_missing)
     85 if (not keep_sec_cntx) and ('Compartment' in inventory):
     86     inventory['Compartment'] = (inventory['Compartment']
     87                                 .str.partition('/')[0])

File ~/Envs/ebm/lib/python3.11/site-packages/stewi/globals.py:331, in read_inventory(inventory_acronym, year, f, download_if_missing)

So I did some digging.

  1. The file exists at https://www.epa.gov/sites/production/files/2020-01/egrid2018_historical_files_since_1996.zip, and I can manually download it (so the forbidden error is likely esupy's problem)
  2. I manually set the zip file into the folder mentioned above. Ran the code again, and it fails to find the file.
  3. Going into stewi's getInventory method raises more questions.
    • f = ensure_format(stewiformat) sets the subdirectory to "flowbyfacility"
    • meta = set_stewi_meta(file_name, str(f)) creates a class with ext attribute set to WRITE_FORMAT (i.e., parquet); this extension is used in esupy's find_file, which obviously fails
    • stewi's read_inventory is called (globals.py)
    • download_if_missing is False (by default), so stewi's generate_inventory is called
    • stewi's egrid.py main() method is called for Option = 'A'
    • This calls download_egrid, which defines the output folder to paths.local_path / 'eGRID Data Files'
    • This calls esupy's make_url_request, which fails
    • IF IT WORKED, THEN:
    • generate_metadata is run creating a JSON in 'eGRID Data Files'
    • stewi's egrid.py main() method is called for Option = 'B'
    • Calls generate_eGRID_files and generate_metadata again, which appears to create the parquet files
    • Now getInventory works!
    • So it looks like the issue lies with requests in esupy (see error message above); there's no fix on user end (e.g., electricitylci) except to manually download the data files and run the generate_metadata and generate_eGRID_files :(

https://github.com/USEPA/standardizedinventories/blob/39b96003865dd261ead51d7e807b190266b0058b/stewi/__init__.py#L63

bl-young commented 9 months ago

Thanks, just discovered this in some other packages as well (e.g., https://github.com/USEPA/LCIAformatter/issues/94). Appears to be a change on EPA's side. Should have a resolution shortly and will report back here.

bl-young commented 9 months ago

This has been resolved on esupy (develop) https://github.com/USEPA/esupy/commit/b481e35262c9387c2e14ddcc8fa114dafe1c641d

Confirmed success in running eGRID on StEWI develop branch: https://github.com/USEPA/standardizedinventories/actions/runs/6148344684/job/16681846069

This issue will be closed when esupy has been updated on main.

dt-woods commented 9 months ago

I appreciate the update. Any idea on when the next scheduled revision is to be published?

bl-young commented 9 months ago

I appreciate the update. Any idea on when the next scheduled revision is to be published?

Do you mean the next release of esupy? v0.3.1 with the fix is now available.