NOAA-OWP / hydrotools

Suite of tools for retrieving USGS NWIS observations and evaluating National Water Model (NWM) data.
Other
53 stars 12 forks source link

nwm_client_new not available #161

Closed jmpmcmanus closed 1 year ago

jmpmcmanus commented 2 years ago

Hi - I installed hydrotools using pip: https://pypi.org/project/hydrotools/ but when I try to import anything from hydrotools.nwm_client_new I get:

from hydrotools.nwm_client_new.NWMFileClient import NWMFileClient raceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'hydrotools.nwm_client_new'

Why is nwm_client_new not available in the pip version?

Thanks Jim

jarq6c commented 2 years ago

This nwm_client_new package is still in beta and has not been deployed to PyPI yet. If you would like to test the new package, you can install the package directly from GitHub using:

$ python3 -m pip install git+http://github.com/NOAA-OWP/hydrotools.git#subdirectory=python/nwm_client_new

Alternatively, you can install the current stable nwm_client package from PyPI:

# Base package
$ python3 -m pip install hydrotools.nwm_client

# ...or install with Google Cloud Support
$ python3 -m pip install hydrotools.nwm_client[gcp]
jmpmcmanus commented 2 years ago

Thanks. That worked. However, when I attempt to run the example:

Import the nwm Client

from hydrotools.nwm_client_new.NWMFileClient import NWMFileClient import pandas as pd

Instantiate model data client

Defaults to Google Cloud Platform

client = NWMFileClient()

Set reference time

yesterday = pd.Timestamp.utcnow() - pd.Timedelta("1D") reference_time = yesterday.strftime("%Y%m%dT%-HZ")

Retrieve forecast data

By default, only retrieves data at USGS gaging sites in

CONUS that are used for model assimilation

forecast_data = client.get( configuration = "short_range", reference_times = [reference_time] )

Look at the data

print(forecast_data.info(memory_usage='deep')) print(forecast_data[['value_time', 'value']].head())

I get the following error: RuntimeError: asyncio.run() cannot be called from a running event loop

Below is the full error output:


RuntimeError Traceback (most recent call last)

in 12 forecast_data = client.get( 13 configuration = "short_range", ---> 14 reference_times = [reference_time] 15 ) 16 ~/anaconda3/envs/pangeoqgis/lib/python3.7/site-packages/hydrotools/nwm_client_new/NWMFileClient.py in get(self, configuration, reference_times, compute) 190 configuration=configuration, 191 reference_time=reference_time, --> 192 netcdf_dir=netcdf_dir 193 ) 194 except QueryError: ~/anaconda3/envs/pangeoqgis/lib/python3.7/site-packages/hydrotools/nwm_client_new/ParquetCache.py in get(self, function, subdirectory, *args, **kwargs) 107 108 # Run function --> 109 df = function(*args, **kwargs) 110 111 # Cache result ~/anaconda3/envs/pangeoqgis/lib/python3.7/site-packages/hydrotools/nwm_client_new/NWMFileClient.py in get_cycle(self, configuration, reference_time, netcdf_dir) 128 129 # Download files --> 130 downloader.get(zip(urls,filenames)) 131 132 # Get dataset ~/anaconda3/envs/pangeoqgis/lib/python3.7/site-packages/hydrotools/nwm_client_new/FileDownloader.py in get(self, src_dst_list) 161 162 # Start event loop to retrieve files --> 163 asyncio.run(self.get_files(src_dst_list)) 164 165 @property ~/anaconda3/envs/pangeoqgis/lib/python3.7/asyncio/runners.py in run(main, debug) 32 if events._get_running_loop() is not None: 33 raise RuntimeError( ---> 34 "asyncio.run() cannot be called from a running event loop") 35 36 if not coroutines.iscoroutine(main): RuntimeError: asyncio.run() cannot be called from a running event loop
jarq6c commented 2 years ago

nwm_client_new and nwis_client use the asyncio library which currently has problems with interactive environments like Jupyter Notebooks and Spyder. If you're using one of these environments you'll want to import and apply nest_asyncio as shown below. Note this must be done before any other imports.

# Import necessary for interactive environments
import nest_asyncio
nest_asyncio.apply()

# Import the nwm Client
from hydrotools.nwm_client_new.NWMFileClient import NWMFileClient
import pandas as pd

# Instantiate model data client
#  Defaults to Google Cloud Platform
client = NWMFileClient()

# Set reference time
yesterday = pd.Timestamp.utcnow() - pd.Timedelta("1D")
reference_time = yesterday.strftime("%Y%m%dT%-HZ")

# Retrieve forecast data
#  By default, only retrieves data at USGS gaging sites in
#  CONUS that are used for model assimilation
forecast_data = client.get(
    configuration = "analysis_assim",
    reference_times = [reference_time]
    )

# Look at the data
print(forecast_data.info(memory_usage='deep'))
print(forecast_data[['value_time', 'value']].head())
jmpmcmanus commented 2 years ago

I am using Jupyter, and that worked. However, now I am getting error about sort_values:

AttributeError: 'DataFrame' object has no attribute 'sort_values'

I get this error when running it in Jupyter and the Python prompt. Here is the full output from the python prompt:

warnings.warn(message) Traceback (most recent call last): File "", line 3, in File "/home/jmcmanus/anaconda3/envs/pangeoqgis/lib/python3.7/site-packages/hydrotools/nwm_client_new/NWMFileClient.py", line 192, in get netcdf_dir=netcdf_dir File "/home/jmcmanus/anaconda3/envs/pangeoqgis/lib/python3.7/site-packages/hydrotools/nwm_client_new/ParquetCache.py", line 109, in get df = function(*args, **kwargs) File "/home/jmcmanus/anaconda3/envs/pangeoqgis/lib/python3.7/site-packages/hydrotools/nwm_client_new/NWMFileClient.py", line 139, in get_cycle df = NWMFileProcessor.convert_to_dask_dataframe(ds) File "/home/jmcmanus/anaconda3/envs/pangeoqgis/lib/python3.7/site-packages/hydrotools/nwm_client_new/NWMFileProcessor.py", line 112, in convert_to_dask_dataframe df = df.sort_values(by="feature_id") File "/home/jmcmanus/anaconda3/envs/pangeoqgis/lib/python3.7/site-packages/dask/dataframe/core.py", line 3757, in getattr raise AttributeError("'DataFrame' object has no attribute %r" % key) AttributeError: 'DataFrame' object has no attribute 'sort_values'

jarq6c commented 2 years ago

That's odd. Is it possible you're using an older version of dask?

jmpmcmanus commented 2 years ago

I already had dask installed as part of pangeo. It was installed using conda. Its version number is 2021.02.0. I will try it on a newer version.

jmpmcmanus commented 2 years ago

I updated my pangeo environment installing dask 2021.11.1 and everything is working! Thanks for your help!!

jarq6c commented 2 years ago

Great news! We'll leave this ticket open until nwm_client_new is deployed to PyPI.

jarq6c commented 1 year ago

Two years late, but a version of nwm_client_new has been deployed to PyPI and is installable via pip. https://pypi.org/project/hydrotools.nwm-client-new/7.1.0/