NREL / developer.nrel.gov

An issue tracker for NREL's APIs available at https://developer.nrel.gov
45 stars 40 forks source link

Issue Accessing Hindcast Data at a specified location #285

Closed ssolson closed 1 year ago

ssolson commented 2 years ago

Hello,

My name is Sterling and I lead development on MHKiT-python. Our toolbox uses nrel-rex to access directional_wave_spectrum from the hindcast data hosted on NREL's HSDS server. Prior, to October our calls to the HSDS server had worked without issue but our CI tests are now failing from no data returned.

I have looked into your docs and I am fairly sure we are not hitting any of the listed rate limits. Additionally, we are not receiving a 429 response but a 0-length dataset.

Working through this issue with the NREL-rex team we determined the following:

  1. Neither of us seem to be able to pull the full data volume from that dataset
  2. The errors are being raised by h5pyd, not rex, and appear to be saying "no data has been returned"
  3. Smaller datasets or a slice of the spectrum dataset are retrieved just fine
  4. The original code requesting the spectrum data in dataframe works fine when pointing to the Eagle datasets

I created a test repository which captures this behavior using nrel-rex here.

What I am hoping is that an expert from NREL's HSDS team can help me diagnose exactly what has changed and how we can access the data we need. Please let me know what sort of test environments I can setup to make helping diagnose this issue easier.

PjEdwards commented 1 year ago

Hi Sterling. This issue just hit my desk. Sorry for the delay. I'm going to take a look a this today and will get back to you as soon as I have any insight to offer. ~Paul

PjEdwards commented 1 year ago

OK, well I've tried to find anything that might have changed in October with no luck. Ultimately I believe Grant's comment here is correct. The directional_wave_spectrum data attribute is 29X larger than all of the others. In the code I maintain for this endpiont I always break the HSDS requests down to one band at a time using a selection like [0::1, 27:28, 0:, 5:6:1] where the second slice starts with 0:1, then 1:2 etc. This has been the case for as long as we've had this data. The real question in my mind isn't how it stopped working recently, but rather how it ever worked for you in the first place! I've never been able to reliably select all dimensions of directional_wave_spectrum for an entire year for a point of that variable via HSDS.

I want you to understand that NREL is offering data access services free to the public on a limited budget. We don't offer any uptime guarantee, nor any performance guarantee, nor do we have any options for adding additional capacity per user. If you have a business case that requires reliable access to this data via HSDS you are strongly encouraged to deploy your own HSDS server. You can host your own HSDS instance that points at the NREL data in AWS S3 alleviating the need to store copies of the full datasets. HSDS docs at https://github.com/HDFGroup/hsds#quick-start . Alternatively, if you have access to abundant storage resources, you can access the full raw data via Amazon's Open Data initiative here (look for the buoy data).

All the best, Paul

ssolson commented 1 year ago

Thank you for your review. I will write a wrapper and hit the API multiple times.

Spinning up an HSDS server is outside the scope and budget of MHKiT. If you are calling on the hindcast data for your own work maybe you will find some of our functionality in MHKiT useful. Here is our WPTO hindcast example (https://github.com/MHKiT-Software/MHKiT-Python/blob/master/examples/WPTO_hindcast_example.ipynb) with many more useful examples in that folder.

Wishing you the best.