alan-turing-institute / deepsensor

A Python package for tackling diverse environmental prediction tasks with NPs.
https://alan-turing-institute.github.io/deepsensor/
MIT License
72 stars 15 forks source link

Issue on page /user-guide/data_processor.html - "Must manually pip-install get-station-data with: ..." #102

Closed simonrolph closed 6 months ago

simonrolph commented 7 months ago

Documentation does not state that I needed to run this line:

pip install git+https://github.com/scott-hosking/get-station-data.git

in order to successfully run this line:

station_raw_df = get_ghcnd_station_data(station_var_IDs, extent, date_range=data_range, cache=True, cache_dir=cache_dir)

I tried running the station_raw_df line and I got an error which produced this error:

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/fdm/fdm.py:38: DeprecationWarning: `np.math` is a deprecated alias for the standard library `math` module (Deprecated Numpy 1.25). Replace usages of `np.math` with `math`
  coefs = mat.inv()[:, deriv] * np.math.factorial(deriv)

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/fdm/fdm.py:44: DeprecationWarning: `np.math` is a deprecated alias for the standard library `math` module (Deprecated Numpy 1.25). Replace usages of `np.math` with `math`
  / np.math.factorial(order)

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/fdm/fdm.py:38: DeprecationWarning: `np.math` is a deprecated alias for the standard library `math` module (Deprecated Numpy 1.25). Replace usages of `np.math` with `math`
  coefs = mat.inv()[:, deriv] * np.math.factorial(deriv)

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/fdm/fdm.py:44: DeprecationWarning: `np.math` is a deprecated alias for the standard library `math` module (Deprecated Numpy 1.25). Replace usages of `np.math` with `math`
  / np.math.factorial(order)

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/deepsensor/data/sources.py](https://localhost:8080/#) in get_ghcnd_station_data(var_IDs, extent, date_range, subsample_frac, num_processes, verbose, cache, cache_dir)
     85     try:
---> 86         from get_station_data import ghcnd
     87     except ImportError:

ModuleNotFoundError: No module named 'get_station_data'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
1 frames
[/usr/local/lib/python3.10/dist-packages/deepsensor/data/sources.py](https://localhost:8080/#) in get_ghcnd_station_data(var_IDs, extent, date_range, subsample_frac, num_processes, verbose, cache, cache_dir)
     86         from get_station_data import ghcnd
     87     except ImportError:
---> 88         raise ImportError(
     89             "Must manually pip-install get-station-data with: `pip install git+[https://github.com/scott-hosking/get-station-data.git`](https://github.com/scott-hosking/get-station-data.git%60)"
     90         )

ImportError: Must manually pip-install get-station-data with: `pip install git+[https://github.com/scott-hosking/get-station-data.git`](https://github.com/scott-hosking/get-station-data.git%60)

I was running this in a google colab notebook and I had previously run these lines as specified in the installation instructions

pip install deepsensor
pip install tensorflow
pip install tensorflow_probability

It would be preferred if the example has as minimal dependencies as possible.

kallewesterling commented 7 months ago

Thank you for pointing out this problem @simonrolph. If you want to suggest where in the documentation we should add these instructions, feel free to open a PR!

...and I hear you on the dependency reliance. This is a package still in development and, as such, sometimes the trimming of dependencies has to happen down the line.

tom-andersson commented 6 months ago

Hey @simonrolph :) thanks for raising this. You're right, we should have a note about this if people are running the documentation themselves. I've added a comment about this in the DataProcessor page.

It would be preferred if the example has as minimal dependencies as possible.

We need weather station data to demonstrate DeepSensor, and get-station-data is a very convenient way to do so. So I'd argue we are already at the minimum number of dependencies.

For some background:

The reason @scotthosking's get-station-data doesn't get installed alongside deepsensor is that pip doesn't allow for GitHub repos as direct dependencies (https://github.com/pypi/warehouse/issues/7136). If get-station-data were itself on pip then we could depend on it, but unfortunately it is only a GitHub repo.

The best workaround I could think of was to prompt the user to manually install get-station-data if they run the function that depends on get-station-data, and do this via a (hopefully) informative error message from your stack trace above. This is also mentioned as a note in the API reference for get_ghcnd_station_data.

I would be against adding get-station-data to the Installation Instructions since many users will not require it and it may be confusing. However, hopefully you are happy with the addition to the DataProcessor page :)

simonrolph commented 6 months ago

Ok all makes sense, thanks for resolving this. Just holding you accountable that the user guide states "The pages of this guide are Jupyter notebooks and are fully-reproducible." 😉

tom-andersson commented 6 months ago

As you should @simonrolph!