Open-EO / openeo-processes-python

A Python representation of (most) openEO processes
Apache License 2.0
11 stars 4 forks source link

Work with optional dependencies #5

Open soxofaan opened 4 years ago

soxofaan commented 4 years ago

I'm investigating if openeo-processes-python would be useful to add to the VITO backend in some form and I found that the set of dependencies of openeo-processes-python is quite heavy.

Installation in a fresh virtual env results in these dependencies:

$ pip freeze
bokeh==2.1.1
click==7.1.2
cloudpickle==1.4.1
dask==2.19.0
distributed==2.19.0
fsspec==0.7.4
HeapDict==1.0.1
Jinja2==2.11.2
llvmlite==0.33.0
locket==0.2.0
MarkupSafe==1.1.1
msgpack==1.0.0
numba==0.50.1
numpy==1.19.0
-e git+git@github.com:Open-EO/openeo-processes-python.git@c5cc64af94ba83872d5f7ee990ce1a64a0cc83c1#egg=openeo_processes
packaging==20.4
pandas==1.0.5
partd==1.1.0
Pillow==7.1.2
psutil==5.7.0
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
PyYAML==5.3.1
scipy==1.5.0
six==1.15.0
sortedcontainers==2.2.2
tblib==1.6.0
toolz==0.10.0
tornado==6.0.4
typing-extensions==3.7.4.2
xarray==0.15.1
xarray-extras==0.4.2
zict==2.0.0

There is a lot in that list (e.g. bokeh, click, jinja2, msgpack, PyYAML, tornado, tblib, psutil, MarkupSafe, locket ...) that quite far from the core functionality we're looking for: implementation of basic openEO (math) processes.

The direct dependencies are currently just: https://github.com/Open-EO/openeo-processes-python/blob/38c6eea5d8f09a348a63e96f967ae68fb777a8d1/setup.cfg#L29-L33

I guess the dask[complete] is the one that drags in all these other dependencies

First: is there an important reason to depend on dask[complete]? These are the only dask related lines in the whole repo:

src/openeo_processes/utils.py:import dask
src/openeo_processes/utils.py:    is_dar = isinstance(data, dask.array.core.Array)

So why not just depend on dask[array]?

Furthermore, would there be interest in making these dependencies on dask, xarray optional? The end user can then cherry-pick which calculation "backends" and dependencies he drags into his project. For example:

lforesta commented 4 years ago

@soxofaan yes absolutely I think we have dask[complete] only because initially I had some issues installing dask with conda,but at least for now only a small part of dask is needed and I like the approach with the optional dependencies