USEPA / standardizedinventories

Standardized Release and Waste Inventories
MIT License
25 stars 16 forks source link

[TRI] TRI -A download breaking #84

Closed michael-long88 closed 2 years ago

michael-long88 commented 2 years ago

Tried downloading some TRI data, but ran into the same issue for both 2016 and 2017

➜ python -m stewi.TRI A -Y 2016
INFO downloading TRI files from source for 2016
Traceback (most recent call last):
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/TRI.py", line 413, in <module>
    extract_TRI_data_files(link_zip_TRI, TRIFiles, TRIyear)
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/TRI.py", line 76, in extract_TRI_data_files
    for line in txtfile:
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 153: invalid start byte
bl-young commented 2 years ago

ok I can't recreate this error. perhaps there is a dependency issue here. Can you confirm what version of pandas you have?

I will look into it

michael-long88 commented 2 years ago

pandas version 1.3.1

bl-young commented 2 years ago

@WesIngwersen @catherinebirney any chance you could try running the TRI module (with option A) to see if you can recreate this error? perhaps ignoring or replacing the errors when reading the zip file but I dont want to lose too much data: line 74:

with io.TextIOWrapper(z.open(filename + '.txt',
        mode='r'), errors='replace') as txtfile:
WesIngwersen commented 2 years ago

I can't replicate the error. I'm using Python 3.7.2 and pandas 1.1.3

C:\Users\wesle\standardizedinventories>python -m stewi.TRI A -Y 2016
INFO downloading TRI files from source for 2016
INFO US_1a_2016.csv saved to C:\Users\wesle\AppData\Local\stewi/TRI Data Files/
INFO US_3a_2016.csv saved to C:\Users\wesle\AppData\Local\stewi/TRI Data Files/
catherinebirney commented 2 years ago

Also can't replicate the error. I get the same results as Wes.

Python 3.7 and pandas 1.2.3

michael-long88 commented 2 years ago

My python version is 3.8.10. I've completely destroyed and recreated the conda environment that I'm installing to and I'm still getting the same error.

bl-young commented 2 years ago

@michael-long88 can you export your environment? are you running windows? conda env export > [env_name].yml

michael-long88 commented 2 years ago

Running on a Mac. Output of the env export:

name: epa
channels:
  - conda-forge
  - defaults
dependencies:
  - appnope=0.1.2=py38h50d1736_1
  - argon2-cffi=20.1.0=py38h96a0964_2
  - async_generator=1.10=py_0
  - attrs=21.2.0=pyhd8ed1ab_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - blas=1.0=mkl
  - bleach=4.0.0=pyhd8ed1ab_0
  - bottleneck=1.3.2=py38hf1fa96c_1
  - ca-certificates=2021.7.5=hecd8cb5_1
  - certifi=2021.5.30=py38hecd8cb5_0
  - cffi=1.14.6=py38h9688ba1_0
  - decorator=5.0.9=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - entrypoints=0.3=pyhd8ed1ab_1003
  - importlib-metadata=4.6.4=py38h50d1736_0
  - intel-openmp=2021.3.0=hecd8cb5_3375
  - ipykernel=5.3.4=py38h5ca1d4c_0
  - ipython=7.26.0=py38h5fd9f69_0
  - ipython_genutils=0.2.0=py_1
  - jedi=0.18.0=py38h50d1736_2
  - jinja2=3.0.1=pyhd8ed1ab_0
  - jsonschema=3.2.0=pyhd8ed1ab_3
  - jupyter_client=6.1.12=pyhd8ed1ab_0
  - jupyter_core=4.7.1=py38h50d1736_0
  - jupyterlab_pygments=0.1.2=pyh9f0ad1d_0
  - libcxx=10.0.0=1
  - libffi=3.3=hb1e8313_2
  - libsodium=1.0.18=hbcb3906_1
  - markupsafe=2.0.1=py38h96a0964_0
  - matplotlib-inline=0.1.2=pyhd8ed1ab_2
  - mistune=0.8.4=py38h96a0964_1004
  - mkl=2021.3.0=hecd8cb5_517
  - mkl-service=2.4.0=py38h9ed2024_0
  - mkl_fft=1.3.0=py38h4a7008c_2
  - mkl_random=1.2.2=py38hb2f4e1b_0
  - nb_conda=2.2.1=py38_1
  - nb_conda_kernels=2.3.1=py38hecd8cb5_0
  - nbclient=0.5.4=pyhd8ed1ab_0
  - nbconvert=6.1.0=py38h50d1736_0
  - nbformat=5.1.3=pyhd8ed1ab_0
  - ncurses=6.2=h0a44026_1
  - nest-asyncio=1.5.1=pyhd8ed1ab_0
  - notebook=6.4.3=pyha770c72_0
  - numexpr=2.7.3=py38h5873af2_1
  - numpy=1.20.3=py38h4b4dc7a_0
  - numpy-base=1.20.3=py38he0bd621_0
  - openssl=1.1.1k=h9ed2024_0
  - packaging=21.0=pyhd8ed1ab_0
  - pandas=1.3.1=py38h5008ddb_0
  - pandoc=2.14.1=h0d85af4_0
  - pandocfilters=1.4.2=py_1
  - parso=0.8.2=pyhd8ed1ab_0
  - pexpect=4.8.0=pyh9f0ad1d_2
  - pickleshare=0.7.5=py_1003
  - pip=21.2.2=py38hecd8cb5_0
  - prometheus_client=0.11.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.19=pyha770c72_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pycparser=2.20=pyh9f0ad1d_2
  - pygments=2.10.0=pyhd8ed1ab_0
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pyrsistent=0.17.3=py38h96a0964_2
  - python=3.8.10=h88f2d9e_7
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python_abi=3.8=2_cp38
  - pytz=2021.1=pyhd3eb1b0_0
  - pyzmq=22.2.1=py38h23ab428_1
  - readline=8.1=h9ed2024_0
  - send2trash=1.8.0=pyhd8ed1ab_0
  - setuptools=52.0.0=py38hecd8cb5_0
  - six=1.16.0=pyh6c4a22f_0
  - sqlite=3.36.0=hce871da_0
  - terminado=0.11.1=py38h50d1736_0
  - testpath=0.5.0=pyhd8ed1ab_0
  - tk=8.6.10=hb0a8c7a_0
  - tornado=6.1=py38h96a0964_1
  - traitlets=5.0.5=py_0
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - webencodings=0.5.1=py_1
  - wheel=0.37.0=pyhd3eb1b0_0
  - xz=5.2.5=h1de35cc_0
  - zeromq=4.3.4=h23ab428_0
  - zipp=3.5.0=pyhd8ed1ab_0
  - zlib=1.2.11=h1de35cc_3
  - pip:
    - appdirs==1.4.4
    - beautifulsoup4==4.9.3
    - charset-normalizer==2.0.4
    - esupy==0.1.7
    - idna==3.2
    - pyarrow==5.0.0
    - pyyaml==5.4.1
    - regex==2021.8.21
    - requests==2.26.0
    - requests-ftp==0.3.1
    - soupsieve==2.2.1
    - stewi==0.9.9
    - urllib3==1.26.6
prefix: /Users/michaellong/miniconda/envs/epa
bl-young commented 2 years ago

Seems like an issue that can arise in macs but not PCs https://stackoverflow.com/a/33117983. Perhaps we need to confirm or be explicit about the encoding somewhere

bl-young commented 2 years ago

@michael-long88 can you try installing from the issue_84 branch and running TRI to see if that fixes the encoding errors on your machine?

pip install git+https://github.com/USEPA/standardizedinventories@issue_84#egg=StEWI

michael-long88 commented 2 years ago

That looks like it worked. After installing again, it doesn't look like stewi/filter.yaml got copied in though. After manually creating the file and copying in the yaml file contents, the TRI command ran perfectly.

bl-young commented 2 years ago

great! I will take care of the install issue here (#85)