catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Minor dependencies issues with pip #405

Closed karldw closed 4 years ago

karldw commented 4 years ago

Describe the bug

A basic pip install doesn't include all necessary dependencies.

Bug Severity

Low

To Reproduce

conda create --name pudl_test_env python=3.7 pip --yes
conda activate pudl_test_env
pip install catalystcoop.pudl
python3 -m pudl
#> File ".../pudl/convert/epacems_to_parquet.py", #> line 33, in <module>
#>     import pyarrow as pa
#> ModuleNotFoundError: No module named 'pyarrow'

Expected behavior

I expected a pip install to install everything necessary to import the pudl module. This could either be fixed by adding pyarrow as a dependency, or by only loading pyarrow in the epacems_to_parquet.py functions when the user actually wants to run that code.

Related questions:

Software Environment?

conda list output ``` # Name Version Build Channel _libgcc_mutex 0.1 main attrs 19.1.0 pypi_0 pypi bitarray 1.0.1 pypi_0 pypi ca-certificates 2019.5.15 1 catalystcoop-pudl 0.1.0a2 pypi_0 pypi cchardet 2.1.4 pypi_0 pypi certifi 2019.6.16 py37_1 chardet 3.0.4 pypi_0 pypi click 7.0 pypi_0 pypi click-default-group 1.2.2 pypi_0 pypi coloredlogs 10.0 pypi_0 pypi cycler 0.10.0 pypi_0 pypi datapackage 1.9.0 pypi_0 pypi dbfread 2.0.7 pypi_0 pypi decorator 4.4.0 pypi_0 pypi docutils 0.15.2 pypi_0 pypi et-xmlfile 1.0.1 pypi_0 pypi goodtables 2.2.1 pypi_0 pypi humanfriendly 4.18 pypi_0 pypi idna 2.8 pypi_0 pypi ijson 2.4 pypi_0 pypi importlib-resources 1.0.2 pypi_0 pypi isodate 0.6.0 pypi_0 pypi jdcal 1.4.1 pypi_0 pypi joblib 0.13.2 pypi_0 pypi jsonlines 1.2.0 pypi_0 pypi jsonpointer 2.0 pypi_0 pypi jsonschema 3.0.2 pypi_0 pypi kiwisolver 1.1.0 pypi_0 pypi libedit 3.1.20181209 hc058e9b_0 libffi 3.2.1 hd88cf55_4 libgcc-ng 9.1.0 hdf63c60_0 libstdcxx-ng 9.1.0 hdf63c60_0 linear-tsv 1.1.0 pypi_0 pypi matplotlib 3.1.1 pypi_0 pypi ncurses 6.1 he6710b0_1 networkx 2.3 pypi_0 pypi numpy 1.17.2 pypi_0 pypi openpyxl 2.4.11 pypi_0 pypi openssl 1.1.1d h7b6447c_0 pandas 0.25.1 pypi_0 pypi pip 19.2.2 py37_0 psycopg2 2.8.3 pypi_0 pypi pybloom-live 3.0.0 pypi_0 pypi pyparsing 2.4.2 pypi_0 pypi pyrsistent 0.15.4 pypi_0 pypi python 3.7.4 h265db76_1 python-dateutil 2.8.0 pypi_0 pypi pytz 2019.2 pypi_0 pypi pyyaml 5.1.2 pypi_0 pypi readline 7.0 h7b6447c_5 requests 2.22.0 pypi_0 pypi rfc3986 1.3.2 pypi_0 pypi scikit-learn 0.21.3 pypi_0 pypi scipy 1.3.1 pypi_0 pypi setuptools 41.0.1 py37_0 simpleeval 0.9.8 pypi_0 pypi six 1.12.0 pypi_0 pypi sqlalchemy 1.3.8 pypi_0 pypi sqlite 3.29.0 h7b6447c_0 statistics 1.0.3.5 pypi_0 pypi tableschema 1.7.0 pypi_0 pypi tableschema-sql 1.1.0 pypi_0 pypi tabulator 1.24.2 pypi_0 pypi timezonefinder 4.1.0 pypi_0 pypi tk 8.6.8 hbc83047_0 unicodecsv 0.14.1 pypi_0 pypi urllib3 1.25.3 pypi_0 pypi wheel 0.33.4 py37_0 xlrd 1.2.0 pypi_0 pypi xlsxwriter 1.2.0 pypi_0 pypi xz 5.2.4 h14c3975_4 zlib 1.2.11 h7b6447c_3 ```
zaneselvans commented 4 years ago

Hmm. IIRC I was trying to avoid needing to load pyarrow all the time because it has some compilation issues that were messing up the docs or tox on OS X or something. But this might be leftover from before I figured out to mock modules for the docs. You probably saw already but you can tell it to install the parquet requirements with the parquet "extras" I will see about integrating pyarrow and parquet into the main install_requires Thanks for being our guinea pig!

The master branch still depends on psycopg2 but it should be (have been?) removed from @cmgosnell data-packaging branch, which no longer relies on postgresql.

zaneselvans commented 4 years ago

Also yes, I will add badges for PyPI and conda-forge as soon as the package is available via conda.