Closed hickey closed 2 years ago
I have to also note (although it really should be another issue) that the endpoint
parameter acts differently if there is a slash at the end of the value.
>>> f = h5pyd.File('test4', 'r', endpoint='https://h5.wt0f.com/', username='xxxx', password='xxxxx')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/h5pyd/_hl/files.py", line 236, in __init__
raise IOError(rsp.status_code, rsp.reason)
OSError: [Errno 400] Bad Request
>>> f = h5pyd.File('test4', 'r', endpoint='https://h5.wt0f.com', username='xxxx', password='xxxx')
>>>
That just seems wrong to me. If there is a reason that a final slash is not acceptable, should there be a test to remove the final slash if it is supplied?
OK, part of this is my limited knowledge of numpy
and working with data frames. Found that if I use an index notation of [...]
or [:]
I was able to get the correct data to display. I was using some of the Jupyter notebooks in the examples
directory as a guide for accessing the h5serv
server. So I am not sure where I was led astray with the examples.
Hey, @hickey - glad you sorted this out! Sorry, for my lack of response, I haven't had time to help out with h5serv recently. Our (The HDF Group) main focus is HSDS (https://github.com/HDFGroup/hsds) which is more of "new generation" HDF service.
Are there specific reasons you have for using h5serv rather than HSDS? It would be nice to have everyone move over to HSDS.
Mostly that I did not know about HSDS. Looking now.
Just a thought, you may want to update the README to start directing people over to the HSDS project.
I have to change my statement above about directing people over to the HSDS project. I finally got around to starting to bring up the HSDS docker container. While I can see where the HSDS project is going it is a much bigger scale than what I need and I suspect bigger than what others may need. So I would incorporate that into the README that if one is just sharing a couple HDF5 files or don't need to scale out to support hundreds or thousands of clients then it is probably just as well to stay with this project.
Interesting - is it that HSDS seems more complicated to spin up compared to h5serv (I would have thought they were fairly equivalent)?
BTW - probably the easiest way to share some files is to just put them in a public S3 bucket. Users can either download the files or use s3fs (python) or s3ros vfd (for C/C++) to read directly.
Well if I were to use HSDS I would need to set up a service node and a data node. In addition, I need to have an S3 bucket to connect to the data node. A whole not more infrastructure than what I need.
Using an S3 bucket directly is not as desirable as I have to pull the HDF5 file down, use it and then upload the file again. A whole lot more operations (and chances for failure modes) than to just slurp the data in through an HDF5 client, process it and then save the data again. I can have much better error handling from code than trying to interpret why any of the S3 transfer utilities exited out.
Right - there are more containers, but it all managed for you by the runall.sh script. And rather than a S3 bucket, you can just have a directory on your server for data storage. See: https://github.com/HDFGroup/hsds/blob/master/docs/docker_install_posix.md.
But I do think h5serv has the edge in terms of hosting a set of existing HDF5 files. With HSDS you either need to convert them into the HSDS sharded format (using hsload
) or extract the file metadata (with hsload --link
).
I've been thinking it would nice to be able to have HSDS just use an existing set of HDF5 files as is - but will need to think about that a bit.
I threw a quick script together to validate that my installation of
h5serv
is working fine and I am having trouble with reading data back fromh5serv
. I am new to working with HDF5 files and usingh5py
/h5pyd
modules, so it could be just something that I and not understanding.The script that I am using is as follows:
I execute the script and write to
h5serv
as follows:I then turn around and try to read the data back in:
I have also copied the file from the server running
h5serv
to the local directory and tried to read the contents of the file:As you can see, the file itself seems to be just fine.
Note: I am using the
h5py_switch
module that will return ath5pyd.File
object when I am accessing the file on theh5serv
server and ah5py.File
object when I read the file locally. Not sure if this is really significant or not, but I figured I would call it out.Wondering if there is anything obvious that I am doing wrong in my test script or if there is an easy way to determine if the problem is with
h5serv
or theh5pyd
module.