HDFGroup / h5serv

Reference service implementation of the HDF5 REST API
Other
168 stars 35 forks source link

Use Gunicorn (http://gunicorn.org/) to enable concurrent read access #101

Closed joshua-gould closed 7 years ago

jreadey commented 8 years ago

@joshua-gould - are you looking to support a large number of clients who will be doing just read requests? It would be tricky to implement if there are any writes in the mix (this wouldn't work at the HDF5 library level).

For your use case is it many clients accessing one file or many clients accessing multiple files? For the later, one approach would be to use docker and instantiate a new container for each file that is accessed (there'd be some mechanism to shutdown containers that haven't been accessed in a while). This would enable utilization of multiple processors on the server.

Finally, I've started a new project to enable high request/high throughput use cases. It'll use the same REST API as h5serv, but is an entirely different implementation. The project is private right now, but I intend to make it public as soon as it can support a minimal level of functionality.

joshua-gould commented 8 years ago

I am looking to support a large number of clients who will be performing read requests only. I'm excited to hear about your new project.

jreadey commented 8 years ago

For the new project the first milestone will be support for HDF5 datasets without compression or variable-length types. I'm hoping to hit that by the end of the year.

For a near-term solution - how about making your HDF5 file collection read-only and running multiple h5serv instances (one per core, say)? You can use the --port= command line arguement to have each h5serv process run on a different port. You'll need some sort of load-balancer to map incoming requests to the set of ports. This approach has the virtue of not requiring any changes in the h5serv code.

jreadey commented 7 years ago

closing issue - stay tuned for release of concurrent server.