Closed TManhente closed 2 years ago
Hey,
Thanks for the interesting issue report!
In your testing, did you change the max_request_size (default 100m) in HSDS? If I run your test with the default I got a 413 error:
Maximum request body size 4394000000 exceeded, actual body size 104857600
The idea of the max_request_size is to avoid the container having an out of memory error with these types of large requests. The h5pyd package knows to paginate large requests when it sees this error, but curl or the REST VOL requests will fail on the 413 response.
So I guess you are looking to use a client other than Python or you wouldn't have run into this... You can do the same kind of pagination logic in other languages, but I realize it's a bit tedious.
Looking for the cause of the 3x memory allocation, here's a rough psuedo code of how the GET /dataset/id/value request works (assuming the max_request_size value is increased):
def GET_Value(request):
arr = await getSelectionData()
def getSelectionData():
arr = await doReadSelection()
def doReadSelection():
arr = np.zeros() # allocates 4GB of data
def chunkCrawler(arr)
def ChunkCrawler - read_chunk_hypeslab(chunkid):
# this runs once per chunkid, 16 * node_count max tasks at any one time
chunk_slice = await http_get(dn_req) # allocates one chunk's worth of memory
arr[x1:x2,y1:y2,z1:z3] = chunk_slice. # chunk_slice should be free to be gc'd now
# back at GET_Value
data = arrayToBytes(arr)l # just ndarray.tobytes() for non-vlen data
await resp.write(data)
return
So it would seem to need 2x the array size of memory for the request. The tobytes() function unfortunately copies the data. There used to be a get_buffer() method in numpy, but it got deprecated. I could even see 3x if the chunk_slice is not being freed immediately, but that is not my understanding of how GC works in Python.
Anyway, I'm currently working on a change that will stream requests. So rather than waiting for the entire response to be assembled on the server and then writing it to the response, the data will be sent continuously (like pagination with h5pyd but on the server side). That should reduce the memory spikes and we should also be able to do away with the max_request_size limit all together. Another advantage is that the client won't need to wait so long for the first bytes to come back from the server and could potentially start doing useful work while the rest of the response streams in.
I'll update this issue when I have something to try out on a branch. It would be great if you could give a try and report back!
Hi, @jreadey, thank you for your response.
In your testing, did you change the max_request_size (default 100m) in HSDS?
Sorry, I've included the config.yml
I used in the attached example (hsds-memory-usage-peaks.zip), but I forgot to mention this in the issue description. I've used mostly the default values, except for these two:
sn_ram
: increased from 1g to 24gmax_request_size
: increased from 100m to 5gThe h5pyd package knows to paginate large requests when it sees this error
I didn't know that. The client we're using is xarray
with the h5netcdf
engine, as I've mentioned. I don't know their code in detail, but I've looked it briefly and it seems that they call h5pyd
to do the actual data access (i.e. send the requests to the server).
I've tried to use h5pyd
directly (without xarray
and h5netcdf
) with the server max_request_size
set to the original 100m
and it failed, though:
>>> import sys
>>> print(sys.version)
3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
>>> import h5pyd
>>> print(h5pyd.__version__)
0.10.1
>>> f = h5pyd.File('http://localhost:5101/home/example.h5', username='usr01', password='usr01pwd')
>>> ds = f['ds01']
>>> print(ds)
<HDF5 dataset "ds01": shape (1300, 1300, 1300), type "<u2">
>>> value = ds[:,:,:]
Traceback (most recent call last):
File "/home/tmanhente/.local/lib/python3.6/site-packages/h5pyd/_hl/dataset.py", line 1132, in __getitem__
rsp = self.GET(req, params=params, format="binary")
File "/home/tmanhente/.local/lib/python3.6/site-packages/h5pyd/_hl/base.py", line 963, in GET
raise IOError(rsp.status_code, rsp.reason)
OSError: [Errno 413] Request Entity Too Large
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "trace-exec.py", line 7, in <module>
exec(l)
File "<string>", line 1, in <module>
File "/home/tmanhente/.local/lib/python3.6/site-packages/h5pyd/_hl/dataset.py", line 1141, in __getitem__
raise IOError("Error retrieving data: {}".format(ioe.errno))
OSError: Error retrieving data: 413
I've retried it raising the HTTPConnection.debuglevel
to see the requests sent by h5pyd
(full log). It seems that h5pyd
tried requesting the following data ranges:
[0:1300:1,0:1300:1,0:1300:1]
→ 413 Request Entity Too Large (~4,1GiB)[0: 656:1,0:1300:1,0:1300:1]
→ 413 Request Entity Too Large (~2,1GiB)[0: 328:1,0:1300:1,0:1300:1]
→ 413 Request Entity Too Large (~1,1GiB)[0: 164:1,0:1300:1,0:1300:1]
→ 413 Request Entity Too Large (~529MiB)[0: 82:1,0:1300:1,0:1300:1]
→ 413 Request Entity Too Large (~264MiB)and then it gave up trying after the 5th attempt.
It looks like h5pyd
has an heuristic to retry the request reducing the range by half, hoping that it fits, rather than actually parsing the 413 response body to get the size limit told by the server and use it to setup a suitable data range:
$ curl --user usr01:usr01pwd --header Accept: application/octet-stream" "http://localhost:5101/datasets/d-2bf27178-d97f130f-925f-4b3ea4-b0bce0/value?select=%5B0%3A82%3A1%2C0%3A1300%3A1%2C0%3A1300%3A1%5D&domain=/home/example.h5"
Maximum request body size 277160000 exceeded, actual body size 104857600
~~~~~~~~~
(it seems that the numbers are misplaced in the response body)
The tobytes() function unfortunately copies the data.
Would it be possible to replace the tobytes()
with a view()
of numpy.bytes
to avoid the data copy? Or that would not fit the interface expected by resp.write()
?
Anyway, I'm currently working on a change that will stream requests. So rather than waiting for the entire response to be assembled on the server and then writing it to the response, the data will be sent continuously
Great news! That would be surely very helpful!
If there's anything else you would like me to try, please let me know.
Actually a mix of using h5pyd pagination and a (modest) increase in the max_request_size seems to work... If I set the max_request_size to 300m, then your program is able read the entire 4GB dataset.
The reason this is that the h5pyd pagination works by reducing the selection range in the first dimension at the chunk boundry. In this case, with a dataset shape of (1300,1300,1300) and a chunk shape of (82, 82. 163), the minimal pagination is: (82, 1300, 1300) which works out to be 264MB. I was able to set the max_request_size to 300m, leave the container mem size as it was, and everything worked.
Give this a try and let me know if it works for you. Appologize if this seems a bit hacky, but we won't need to deal with this one the streaming logic is in.
BTW, did you run hsconfigure to save your endpoint, username, and password to the .hscfg file?
Unless endpoint, username, and password or supplied in the h5pyd.File arguments, h5pyd whill just pull these from the .hscfg. So then the file open would look like: f = h5pyd.File('home/example.h5')
. You get the same result either way, just less verbose using the .hscfg style.
Update - the streaming support is checked into master now. This has the effect that any size request can be sent or received to/from the service as long as it is binary and uses a fixed-size type (support for variable types may be added later but I don't think it makes sense to send gb's of JSON back and forth). Memory usage on the server should be limited regardless of large the request is.
There's a test case for the feature at: https://github.com/HDFGroup/hsds/tree/master/tests/perf/stream.
It does the following:
e.g.:
$ python stream_test.py
testStream2D /home/john/perf2/stream.h5
dataset shape: [12000, 2200]
got dset_id: d-3424aca7-bd556ffc-2662-75c052-d3ecc1
initializing test data (211200000 bytes, 201.42 MiB)
writing...
elapsed: 4.43 s, 45.48 mb/s
reading...
elapsed: 3.96 s, 50.86 mb/s
comparing sent vs. received
passed!
Let me know if you run into any problems with this.
The HSDS docker image is on DockerHub: hdfgroup/hsds:36a7c61
Hi, @jreadey, sorry for the delay in replying to you.
Actually a mix of using h5pyd pagination and a (modest) increase in the max_request_size seems to work... If I set the max_request_size to 300m, then your program is able read the entire 4GB dataset. [...] Give this a try and let me know if it works for you.
I've tried setting the max_request_size
to 300m and it worked: h5pyd
managed to download the 4GB dataset by sending a series of smaller (~264MB) requests to the server. The service node uses up to 1,9GB of memory at each of the requests, which is still a lot of memory, but within a reasonable limit.
BTW, did you run hsconfigure to save your endpoint, username, and password to the .hscfg file?
Yes, I do have an .hscf
file, but it point to the real server and with my real credentials. As I set up the dummy example I've attached in this issue to isolate the configuration and test scenario to reproduce the memory spikes, I found it easier to simply put this dummy server address and its dummy user credentials directly in the h5pyd.File()
call rather than swapping back and forth my .hscfg
file between the real and the dummy servers.
Update - the streaming support is checked into master now.
Nice! I've tried the streaming support by changing the HSDS Docker image version I was using from v0.7beta13
to 36a7c61
and it worked for the 4GB data: h5pyd
was able to get the entire dataset issuing only a single request. The service node used at most 1,2GB memory. I didn't notice any high memory usage spike.
Ok, great - glad to hear everything worked!
If you want to further reduce the memory footprint, you can make the max_request_size config even less. In streaming mode, the server is processing data in max_request_size pages. Making it too small though may reduce performance or result in 413 errors.
Will close this issue now.
When making a
/datasets/{:dsId}/value
request, the service node seems to have spikes of allocating lots of memory (many times more than the size of data requested). This may cause the service node to exceed its memory limit and have the process be terminated.I ran into this issue while trying to set an HSDS instance for serving files with datasets of about 4,5GiB. As the users access the HSDS through libraries like
xarray
andh5netcdf
, which may try to send requests of the full dataset value to HSDS, we experienced 500 and 503 errors due to the service node process being terminated. I had to configure the service node container to have 24GiB of memory so that it could handle a single request of the 4,5GiB dataset without being terminated.I've created a dummy example (hsds-memory-usage-peaks.zip) to reproduce this error and recorded the memory usage of the service node process (highlighted in yellow at the terminal's top split showing the htop output) while sending a request:
https://user-images.githubusercontent.com/4401054/166116657-662034bd-000d-4ba3-b8d3-899cb383a46d.mp4
I've also noticed that the peak memory goes down slowly as the client downloads the response body, but stays high up to its end. This behaviour can be seen by throttling the client download speed:
https://user-images.githubusercontent.com/4401054/166117067-0b305dd9-2fc3-4a6f-92a8-ce5fd87221cb.mp4
I've run those tests on CentOS 7 machines (both physical and in WSL) and using both the
v0.6.3
and thev0.7beta13
HSDS Docker images published in the Docker Hub.Is it possible that the HSDS service node is creating unneeded copies of the data, or keeping references to data no longer used for a scope bigger than needed and delaying their disposal?