NREL / rex

REsource eXtraction Tool (rex)
https://nrel.github.io/rex
BSD 3-Clause "New" or "Revised" License
19 stars 10 forks source link

how to efficiently extract US-Wave timeseries dataset for specific spatiotemporal boundary #160

Open radityadanu opened 1 year ago

radityadanu commented 1 year ago

Bug Description I wanted to extract the timeseries data of significant wave height (swh) in the specific area of the Atlantic region. What is the most efficient method to perform this task? I tried using the rex-WaveX extraction and was somewhat stuck in the loop with the following traceback. However, if I call the index individually, it passes through the loop and feeds data.

Is there any method to achieve the task? I am thinking about giving the "box" spatial boundary condition and specify the temporal boundary condition to extract all the timeseries.

Full Traceback

ERROR:root:got <class 'requests.exceptions.RetryError'> exception: HTTPSConnectionPool(host='developer.nrel.gov', port=443): Max retries exceeded with url: /api/hsds/datasets/d-3e9a0496-75729c43-eaa5-750126-3f3071?domain=%2Fnrel%2FUS_wave%2FAtlantic%2FAtlantic_wave_1979.h5&api_key=rcufYNIUpUJPlUP5n637v0O03DKEljn5IfyabLFZ (Caused by ResponseError('too many 503 error responses'))
---------------------------------------------------------------------------
MaxRetryError                             Traceback (most recent call last)
File [c:\ProgramData\miniconda3\lib\site-packages\requests\adapters.py:489](file:///C:/ProgramData/miniconda3/lib/site-packages/requests/adapters.py:489), in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    488 if not chunked:
--> 489     resp = conn.urlopen(
    490         method=request.method,
    491         url=url,
    492         body=request.body,
    493         headers=request.headers,
    494         redirect=False,
    495         assert_same_host=False,
    496         preload_content=False,
    497         decode_content=False,
    498         retries=self.max_retries,
    499         timeout=timeout,
    500     )
    502 # Send the request.
    503 else:

File [c:\ProgramData\miniconda3\lib\site-packages\urllib3\connectionpool.py:878](file:///C:/ProgramData/miniconda3/lib/site-packages/urllib3/connectionpool.py:878), in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    877     log.debug("Retry: %s", url)
--> 878     return self.urlopen(
    879         method,
    880         url,
    881         body,
    882         headers,
    883         retries=retries,
    884         redirect=redirect,
    885         assert_same_host=assert_same_host,
    886         timeout=timeout,
    887         pool_timeout=pool_timeout,
    888         release_conn=release_conn,
    889         chunked=chunked,
    890         body_pos=body_pos,
    891         **response_kw
    892     )
    894 return response

File [c:\ProgramData\miniconda3\lib\site-packages\urllib3\connectionpool.py:878](file:///C:/ProgramData/miniconda3/lib/site-packages/urllib3/connectionpool.py:878), in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    877     log.debug("Retry: %s", url)
--> 878     return self.urlopen(
    879         method,
    880         url,
    881         body,
    882         headers,
    883         retries=retries,
    884         redirect=redirect,
    885         assert_same_host=assert_same_host,
    886         timeout=timeout,
    887         pool_timeout=pool_timeout,
    888         release_conn=release_conn,
    889         chunked=chunked,
    890         body_pos=body_pos,
    891         **response_kw
    892     )
    894 return response

    [... skipping similar frames: HTTPConnectionPool.urlopen at line 878 (7 times)]

File [c:\ProgramData\miniconda3\lib\site-packages\urllib3\connectionpool.py:878](file:///C:/ProgramData/miniconda3/lib/site-packages/urllib3/connectionpool.py:878), in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    877     log.debug("Retry: %s", url)
--> 878     return self.urlopen(
    879         method,
    880         url,
    881         body,
    882         headers,
    883         retries=retries,
    884         redirect=redirect,
    885         assert_same_host=assert_same_host,
    886         timeout=timeout,
    887         pool_timeout=pool_timeout,
    888         release_conn=release_conn,
    889         chunked=chunked,
    890         body_pos=body_pos,
    891         **response_kw
    892     )
    894 return response

File [c:\ProgramData\miniconda3\lib\site-packages\urllib3\connectionpool.py:868](file:///C:/ProgramData/miniconda3/lib/site-packages/urllib3/connectionpool.py:868), in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    867 try:
--> 868     retries = retries.increment(method, url, response=response, _pool=self)
    869 except MaxRetryError:

File [c:\ProgramData\miniconda3\lib\site-packages\urllib3\util\retry.py:592](file:///C:/ProgramData/miniconda3/lib/site-packages/urllib3/util/retry.py:592), in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
    591 if new_retry.is_exhausted():
--> 592     raise MaxRetryError(_pool, url, error or ResponseError(cause))
    594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='developer.nrel.gov', port=443): Max retries exceeded with url: /api/hsds/datasets/d-3e9a0496-75729c43-eaa5-750126-3f3071?domain=%2Fnrel%2FUS_wave%2FAtlantic%2FAtlantic_wave_1979.h5&api_key=rcufYNIUpUJPlUP5n637v0O03DKEljn5IfyabLFZ (Caused by ResponseError('too many 503 error responses'))

During handling of the above exception, another exception occurred:

RetryError                                Traceback (most recent call last)
File [c:\ProgramData\miniconda3\lib\site-packages\h5pyd\_hl\httpconn.py:438](file:///C:/ProgramData/miniconda3/lib/site-packages/h5pyd/_hl/httpconn.py:438), in HttpConn.GET(self, req, format, params, headers, use_cache)
    437 s = self.session
--> 438 rsp = s.get(
    439     self._endpoint + req,
    440     params=params,
    441     headers=headers,
    442     stream=True,
    443     timeout=self._timeout,
    444     verify=self.verifyCert(),
    445 )
    446 self.log.info("status: {}".format(rsp.status_code))

File [c:\ProgramData\miniconda3\lib\site-packages\requests\sessions.py:600](file:///C:/ProgramData/miniconda3/lib/site-packages/requests/sessions.py:600), in Session.get(self, url, **kwargs)
    599 kwargs.setdefault("allow_redirects", True)
--> 600 return self.request("GET", url, **kwargs)

File [c:\ProgramData\miniconda3\lib\site-packages\requests\sessions.py:587](file:///C:/ProgramData/miniconda3/lib/site-packages/requests/sessions.py:587), in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
    589 return resp

File [c:\ProgramData\miniconda3\lib\site-packages\requests\sessions.py:701](file:///C:/ProgramData/miniconda3/lib/site-packages/requests/sessions.py:701), in Session.send(self, request, **kwargs)
    700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
    703 # Total elapsed time of the request (approximately)

File [c:\ProgramData\miniconda3\lib\site-packages\requests\adapters.py:556](file:///C:/ProgramData/miniconda3/lib/site-packages/requests/adapters.py:556), in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    555 if isinstance(e.reason, ResponseError):
--> 556     raise RetryError(e, request=request)
    558 if isinstance(e.reason, _ProxyError):

RetryError: HTTPSConnectionPool(host='developer.nrel.gov', port=443): Max retries exceeded with url: /api/hsds/datasets/d-3e9a0496-75729c43-eaa5-750126-3f3071?domain=%2Fnrel%2FUS_wave%2FAtlantic%2FAtlantic_wave_1979.h5&api_key=rcufYNIUpUJPlUP5n637v0O03DKEljn5IfyabLFZ (Caused by ResponseError('too many 503 error responses'))

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
Cell In[3], line 5
      3 # for i in len(coord_tuples)
      4 for i in range(len(coord_MA)):
----> 5     with WaveX(wave_file, hsds=True) as f:
      6         lat_lon_swh = f.get_lat_lon_df('significant_wave_height', coord_MA[i])
      7     MA_swh[i] = lat_lon_swh 

Cell In[3], line 6
      4 for i in range(len(coord_MA)):
      5     with WaveX(wave_file, hsds=True) as f:
----> 6         lat_lon_swh = f.get_lat_lon_df('significant_wave_height', coord_MA[i])
      7     MA_swh[i] = lat_lon_swh 

File [c:\ProgramData\miniconda3\lib\site-packages\rex\resource_extraction\resource_extraction.py:788](file:///C:/ProgramData/miniconda3/lib/site-packages/rex/resource_extraction/resource_extraction.py:788), in ResourceX.get_lat_lon_df(self, ds_name, lat_lon, check_lat_lon)
    765 def get_lat_lon_df(self, ds_name, lat_lon, check_lat_lon=True):
    766     """
    767     Extract timeseries of site(s) nearest to given lat_lon(s) and return
    768     as a DataFrame
   (...)
    786         Time-series DataFrame for given site(s) and dataset
    787     """
--> 788     gid = self.lat_lon_gid(lat_lon, check_lat_lon=check_lat_lon)
    789     df = self.get_gid_df(ds_name, gid)
    791     return df

File [c:\ProgramData\miniconda3\lib\site-packages\rex\resource_extraction\resource_extraction.py:608](file:///C:/ProgramData/miniconda3/lib/site-packages/rex/resource_extraction/resource_extraction.py:608), in ResourceX.lat_lon_gid(self, lat_lon, check_lat_lon)
    605 dist, gids = self.tree.query(lat_lon)
    607 if check_lat_lon:
--> 608     self._check_lat_lon(lat_lon)
    609     dist_check = dist > self.distance_threshold
    610     if np.any(dist_check):

File [c:\ProgramData\miniconda3\lib\site-packages\rex\resource_extraction\resource_extraction.py:561](file:///C:/ProgramData/miniconda3/lib/site-packages/rex/resource_extraction/resource_extraction.py:561), in ResourceX._check_lat_lon(self, lat_lon)
    552 def _check_lat_lon(self, lat_lon):
    553     """
    554     Check lat lon coordinates against domain
    555 
   (...)
    559         Either a single (lat, lon) pair or series of (lat, lon) pairs
    560     """
--> 561     lat_min, lat_max = np.sort(self.lat_lon[:, 0])[[0, -1]]
    562     lon_min, lon_max = np.sort(self.lat_lon[:, 1])[[0, -1]]
    564     lat = lat_lon[:, 0]

Code Sample

# import
from rex import WaveX
import h5pyd
import pandas as pd
import numpy as np

# reading the coordinate tuple 
coord_MA = pd.read_csv('MA_grid_2km.csv')
coord_MA = [tuple(row) for row in coord_MA.to_records(index=False)]

# list of NREL Hindcast data by year
year = np.arange(1979,2011,1,dtype=int)
wave_file = []
for i in year:
        a =  "/nrel/US_wave/Atlantic/Atlantic_wave_"+str(i)+".h5"
        wave_file.append(a)

MA_swh = pd.DataFrame()

for i in range(len(wave_file)):
    for j in range(len(coord_MA)):
        with WaveX(wave_file[i], hsds=True) as f:
            lat_lon_swh = f.get_lat_lon_df('significant_wave_height', coord_MA[j])
        MA_swh[i] = lat_lon_swh 

To Reproduce Steps to reproduce the problem behavior

  1. Run the code above

Expected behavior Expected to have the timeseries of 547 locations from 01/01/1979 to 12/31/2010.

Screenshots When i=3, error with traceback mentioned above pops up. However if I call this code:

with WaveX(wave_file[1], hsds=True) as f:
    lat_lon_swh = f.get_lat_lon_df('significant_wave_height', coord_MA[3])`

The system returns: image

System (please complete the following information):

Additional context Here's the grid file called in the script.

I hope someone can help! Thank you, Danu

MA_grid_2km.csv