HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
110 stars 39 forks source link

503 errors #70

Closed scottmn closed 4 years ago

scottmn commented 4 years ago

I have been using h5pyd to access data from the NREL wind toolkit similar to these examples: https://github.com/NREL/hsds-examples

Recently the service has become unusable due to frequent 503 error responses. I upgraded to the 0.4.0 version of h5pyd (with retry logic) which helps, but still usually does not prevent the issue.

RetryError(MaxRetryError("HTTPSConnectionPool(host='developer.nrel.gov', port=443): Max retries exceeded with url: /api/hsds//datasets/abc?domain=%2Fnrel%2Fwtk-us.h5&api_key=abc (Caused by ResponseError('too many 503 error responses',))",),)

Is it just a case of demand/load not being met by the NREL infrastructure? Or is there a problem with how I'm calling the service? Is there anything that can be done to remedy this?

I see in the code history that the retry number was at one time 30 and then reduced to 3. Is there a number somewhere between those that may have greater success?

Thanks.

jreadey commented 4 years ago

It's likely the case that the number of requests coming in to NREL's server is more than it can handle and hence the 503's. The load from different users can get quite high, so there will be times that not every request can be processed.

As an alternative, you can sign up for HDF Group's Kita Lab: https://www.hdfgroup.org/hdfkitalab/. This is a JupyterLab environment that runs on AWS with a colocated Kita server. Advantages is that it's likely to have more capacity than NREL's server. Also, data read from S3 to Kita Server to the Python notebook stays in Amazon's datacenter, so throughput is better. To run Python code that accesses NREL's data in KitaLab, you just need to add a bucket="nrel-pds-hsds" parameter in the h5pyd.File constructor. See: https://github.com/HDFGroup/hdflab_examples/blob/master/NREL/nrel_example.ipynb for an example of this.

Another alternative would be to run your own HSDS service. This would give you exclusive use of the server and you can scale it up to whatever must suites your needs. (see https://github.com/HDFGroup/hsds/blob/master/docs/docker_install.rst for install instructions). Running HSDS on a AWS EC2 instance in us-west-2 will be best, performance-wise, since that's where the NREL S3 bucket is located.

Hope this helps!

jreadey commented 4 years ago

Closing this issue.