HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
110 stars 39 forks source link

Fancy indexing index list length is limited by GET query size #113

Closed jananzhu closed 1 year ago

jananzhu commented 2 years ago

This was found while testing out the new fancy indexing feature (#47) When indexing into a dataset like f['coordinates'][:, idx_list, :] and idx_list contains more than ~1000 indices, HSDS returns a 400 error.

The following error is written to the SN logs:

Error handling request
Traceback (most recent call last):
  File "/opt/env/hsds/lib/python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
  File "aiohttp/_http_parser.pyx", line 628, in aiohttp._http_parser.cb_on_url
aiohttp.http_exceptions.LineTooLong: 400, message='Got more than 8190 bytes (8684) when reading Status line is too long.'

Looks like we're hitting an 8KB limit on the size of HTTP GET query made to the HSDS server when the index list is too long. There should be a way to make fancy indexing selections via POST so that we're not restricted by this limit.

jreadey commented 2 years ago

With the latest code in the fancyindx branch's of HSDS and h5pyd, h5pyd will use POST is the select query param has more than 100 characters.

This avoids the 8Kb limit, but depending on how many columns you are indexing and how many different chunks those columns touch, you may get 503 errors from the server (same as if you were doing a large regular hyperslab selection).

jreadey commented 2 years ago

The latest update to the fancyindx branch in HSDS fixes the issue with large url request between the SN and DN nodes.

jreadey commented 1 year ago

Closing this issue