USEPA / standardizedinventories

Standardized Release and Waste Inventories
MIT License
25 stars 16 forks source link

GHGRP - C_CONFIG... download breaking #80

Closed WesIngwersen closed 2 years ago

WesIngwersen commented 2 years ago

It now seems to be breaking while downloading the data. It might even be breaking when getting the table URL since I'm not seeing the log statement that proceeds the generate_url function call. Since it's breaking somewhere in there, it's causing the try block to fail, which means that table_df isn't actually assigned, resulting in the following error:

➜ python -m stewi.GHGRP A -Y 2019
INFO downloading and processing GHGRP data to /Users/michaellong/Library/Application Support/stewi/GHGRP Data Files/tables/2019/
INFO Downloading C_CONFIGURATION_LEVEL_INFO (rows: 15873)
Traceback (most recent call last):
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/GHGRP.py", line 813, in <module>
    main()
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/GHGRP.py", line 665, in main
    ghgrp1 = download_and_parse_subpart_tables(year)
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/GHGRP.py", line 304, in download_and_parse_subpart_tables
    table_df = import_or_download_table(filepath, subpart_emissions_table,
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/GHGRP.py", line 259, in import_or_download_table
    for col in table_df:
UnboundLocalError: local variable 'table_df' referenced before assignment

Some error statements in those blocks of code would also be useful for tracking down errors for those of us running from a pip install rather than running directly from a local copy of the repo.

Originally posted by @michael-long88 in https://github.com/USEPA/standardizedinventories/issues/76#issuecomment-902939185

bl-young commented 2 years ago

Either there is a bug in envirofacts or they have changed the mechanism (in the last few months) for generating the query. The URL generates about 2100 lines and then:

Prod sql length=5286 Error in URL. Please check the URL syntax and try again. Email EnviroMail@epa.gov for assistance with the syntax.

URL: https://data.epa.gov/efservice/C_CONFIGURATION_LEVEL_INFO/REPORTING_YEAR/=/2019/ROWS/0:9999/csv

bl-young commented 2 years ago

Note that with https://github.com/USEPA/standardizedinventories/pull/81/commits/3da1af24c4adf5d799e683b57350960e62364819 and related commits the error handling is improved such that issues with envirofacts will not cause the function to break. However, while errors are noted clearly in the console, users should be aware that the dataset won't be complete.

bl-young commented 2 years ago

I have reached out to EPA on resolving the issue in the source data. It appears that a handful of tables are impacted besides just Subpart C:

INFO downloading and processing GHGRP data to C:\Users\BYoung\AppData\Local\stewi/GHGRP Data Files/tables/2019/
INFO Downloading C_CONFIGURATION_LEVEL_INFO (rows: 15873)
ERROR error in downloading table C_CONFIGURATION_LEVEL_INFO
INFO Downloading C_FUEL_LEVEL_INFORMATION (rows: 18376)
ERROR error in downloading table C_FUEL_LEVEL_INFORMATION
INFO Downloading D_SUBPART_LEVEL_INFORMATION (rows: 4499)
INFO Downloading F_SUBPART_LEVEL_INFORMATION (rows: 42)
INFO Downloading G_SUBPART_LEVEL_INFORMATION (rows: 116)
INFO Downloading H_SUBPART_LEVEL_INFORMATION (rows: 368)
INFO Downloading MV_EF_I_EMISSIONS_BY_GHG (rows: 563)
ERROR error in downloading table MV_EF_I_EMISSIONS_BY_GHG
INFO Downloading K_SUBPART_LEVEL_INFORMATION (rows: 32)
INFO Downloading N_SUBPART_LEVEL_INFORMATION (rows: 408)
INFO Downloading P_SUBPART_LEVEL_INFO (rows: 448)
INFO Downloading Q_SUBPART_LEVEL_INFORMATION (rows: 488)
INFO Downloading R_SUBPART_LEVEL_INFORMATION (rows: 11)
INFO Downloading S_SUBPART_LEVEL_INFORMATION (rows: 284)
INFO Downloading T_SUBPART_LEVEL_INFORMATION (rows: 15)
INFO Downloading U_SUBPART_LEVEL_INFORMATION (rows: 6)
INFO Downloading V_SUBPART_LEVEL_INFORMATION (rows: 32)
INFO Downloading EF_W_EMISSIONS_SOURCE_GHG (rows: 50995)
ERROR error in downloading table EF_W_EMISSIONS_SOURCE_GHG
INFO Downloading X_SUBPART_LEVEL_INFORMATION (rows: 278)
INFO Downloading Y_SUBPART_LEVEL_INFORMATION (rows: 552)
INFO Downloading Z_SUBPART_LEVEL_INFORMATION (rows: 0)
INFO Downloading AA_FOSSIL_FUEL_INFORMATION (rows: 379)
INFO Downloading AA_SPENT_LIQUOR_INFORMATION (rows: 280)
INFO Downloading DD_SUBPART_LEVEL_INFORMATION (rows: 148)
INFO Downloading FF_SUBPART_LEVEL_INFORMATION (rows: 134)
INFO Downloading EE_SUBPART_LEVEL_INFORMATION (rows: 24)
INFO Downloading GG_SUBPART_LEVEL_INFORMATION (rows: 20)
INFO Downloading HH_SUBPART_LEVEL_INFORMATION (rows: 1124)
INFO Downloading II_SUBPART_LEVEL_INFORMATION (rows: 126)
INFO Downloading SS_SUBPART_LEVEL_INFORMATION (rows: 50)
INFO Downloading TT_SUBPART_GHG_INFO (rows: 659)
...
michael-long88 commented 2 years ago

Awesome, thanks for the help on this so far.

bl-young commented 2 years ago

EPA claims the issue has been fixed. I was still receiving the error in subpart C only, but adjusting the size of the data call (to 5000 rows) seemed to improve things.