HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
114 stars 38 forks source link

Code ran ok on Sep 21 commit but display IOError: [Errno 403] Forbidden with latest build #40

Closed DrKenHo closed 6 years ago

DrKenHo commented 6 years ago

Dear John,

I have encountered an error after pulling the latest codes.

File "/usr/local/lib/python2.7/dist-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/files.py", line 161, in init raise IOError(rsp.status_code, rsp.reason) IOError: [Errno 403] Forbidden

When I reverted it back to b03d59b3ac2c3e4024da425004fcdb7bce5336a1 commit, there is no problem.

Not sure what exactly changed and I don't seem to find any changes on line 161 in _hl/files.py.

Would you be able to give me some clue or suggestion to resolve this?

Thanks Ken

jreadey commented 6 years ago

Hi Ken, That's strange - all the travis tests are passing: https://travis-ci.org/HDFGroup/h5pyd.

Do you see anything from the server log that might help diagnose?

DrKenHo commented 6 years ago

Hi John,

I shall isolate the code and track what is going on and let you know later.

Regards Ken

jreadey commented 6 years ago

@DrKenHo - I'm going to close this now - please re-open if you have time to investigate further.

DrKenHo commented 6 years ago

Hi @jreadey - sorry for taking so long to get back to the issue as I was busy with other things.

Ok, I think it was to do with the changes in requiring authentication that previously it didn't require. However, I am still having problem dealing with it, I guess I maybe missing the API_KEY. What should I set that to?

Any help will be much appreciated. Thanks

Ref: Errors:

https://github.com/openssbd/Py_SSBDapi/blob/debugging_h5pyd/SSBD_restful_api_v3.0-debugging%20h5pyd.ipynb

If run with Sep 21 commit, no error:

https://github.com/openssbd/Py_SSBDapi/blob/debugging_h5pyd/SSBD_restful_api_v3.0-debugging%20h5pyd-21Sep17.ipynb

jreadey commented 6 years ago

Hi @DrKenHo - What is the server output when you get this error?

  In your server installation, did you configure user accounts/passwords?  See: http://h5serv.readthedocs.io/en/latest/AdminTools.html.
DrKenHo commented 6 years ago

Hi @jreadey ,

I ran with the h5serv docker and didn't configure any user accounts and passwords. I don't think it is required as it is not mentioned on the README docker page.

https://github.com/HDFGroup/h5serv

docker run -p 5000:5000 -d --rm --net=host -v :/data hdfgroup/h5serv

I shall take a look at the server and see whether it is to do with user and passwords. The sept 21 commit did not require such settings.

DrKenHo commented 6 years ago

Ok, got it.

password file is missing!

ERROR:tornado.access:500 GET /?domain=081505_L1_bd5.hdfgroup.org (172.21.20.213) 2.15ms INFO:authFile.py:57::Auth.getUserInfo: [test_user1] ERROR:authFile.py:73::password file is missing ERROR:tornado.access:500 GET /?domain=081505_L1_bd5.hdfgroup.org (172.21.20.213) 2.11ms

I shall try to create the password file for it.

Meanwhile when I run it with the Sept 21 commit code, it doesn't care about the password and the server returns the data.

Somehow the latest version uses authentication for the server.

DrKenHo commented 6 years ago

Hi John,

I followed the instruction to create test_user1 and password test. Now, no problem with the password file and instead I have Error 403


IOError Traceback (most recent call last)

in () ----> 1 f = h5pyd.File('081505_L1_bd5.hdfgroup.org', 'r', username=USER_NAME, password=USER_PASSWD, endpoint=h5servloc) /usr/local/lib/python2.7/dist-packages/h5pyd/_hl/files.pyc in __init__(self, domain, mode, endpoint, username, password, api_key, use_session, use_cache, logger, **kwds) 189 # file must exist 190 http_conn.close() --> 191 raise IOError(rsp.status_code, rsp.reason) 192 if rsp.status_code == 200 and mode in ('w-', 'x'): 193 # Fail if exists IOError: [Errno 403] Forbidden From the server log host: ssbd1.qbic.riken.jp topdomain: hdfgroup.org top-level domain is not valid INFO:authFile.py:57::Auth.getUserInfo: [test_user1] INFO:authFile.py:66::Auth-got cache value INFO:authFile.py:177::user password validated INFO:app.py:284::getFilePath: ssbd1.qbic.riken.jp:5000 checkExists: True INFO:app.py:286::tocFilePath: /data/.toc.h5 WARNING:tornado.access:403 GET /?domain=081505_L1_bd5.hdfgroup.org (172.21.20.213) 3.11ms I did a git diff https://github.com/HDFGroup/h5pyd/commit/5c2b656d503c6720297574176ce0595b1be254c5#diff-bb604a2de46e2dd6e56845fdc8ab1e2d There were some changes on the file httpconn.py, header["host"] = domain, but now I guess domain is something else. In httpconn.py, I uncomment the line 114 headers['host'] = domain Now, the behaviour seems to return to the previous version. 👍 Question, what is your recommend usage? Regards Ken
jreadey commented 6 years ago

Hi @DrKenHo,

I suspect that the file ACL (access control list) is not correct for this file. Try running get_acl.py as described here: http://h5serv.readthedocs.io/en/latest/AdminTools.html. What does that report?

DrKenHo commented 6 years ago

Hi @jready,

I shall check ACL when I get to the office.

I am a bit confused with the design of h5serv and h5pyd. The h5serv does not seem to require access control , but you require the h5pyd API to use access control to communicate with h5serv? Is it because you are introducing access control gradually in h5serv while keeping it backward compatible? Or did I misconfigrure the h5serv wrongly, but I pull it directly from Docker?

jreadey commented 6 years ago

Access control is configurable in h5serv - there's a "allow_noauth" setting here: https://github.com/HDFGroup/h5serv/blob/develop/server/config.py the defaults to true.

h5pyd just passes along whatever username and password is provided (if any).

It might be easy to diagnose by taking h5pyd out of the equation for now. On my local h5serv instance I can run this command:

$ curl http://127.0.0.1:5000/?host="tall.public.hdfgroup.org"

and get this response:

{"lastModified": "2018-03-13T17:28:58Z", "hrefs": [{"href": "http://127.0.0.1:5000/? host=tall.public.hdfgroup.org", "rel": "self"}, {"href": "http://127.0.0.1:5000/datasets? host=tall.public.hdfgroup.org", "rel": "database"}, {"href": "http://127.0.0.1:5000/groups? host=tall.public.hdfgroup.org", "rel": "groupbase"}, {"href": "http://127.0.0.1:5000/datatypes? host=tall.public.hdfgroup.org", "rel": "typebase"}, {"href": "http://127.0.0.1:5000/groups/0371e6c7- 26e4-11e8-b308-3c15c2da029e?host=tall.public.hdfgroup.org", "rel": "root"}], "root": "0371e6c7- 26e4-11e8-b308-3c15c2da029e", "created": "2018-03-13T17:28:58Z"}

Adjusting for your file location, does that work for you?

DrKenHo commented 6 years ago

Hi John,

Thanks for the feedback.

getacl returns no ACL.

root@af1a696fb31d:/usr/local/src/h5serv/util/admin# python getacl.py -file /data/081505_L1_bd5.h5 no ACLs

However, if I curl the server, it returns the file.

$ curl http://ssbd1.qbic.riken.jp:5000/?host="081505_L1_bd5.hdfgroup.org" {"hrefs": [{"rel": "self", "href": "http://ssbd1.qbic.riken.jp:5000/?host=081505_L1_bd5.hdfgroup.org"}, {"rel": "database", "href": "http://ssbd1.qbic.riken.jp:5000/datasets?host=081505_L1_bd5.hdfgroup.org"}, {"rel": "groupbase", "href": "http://ssbd1.qbic.riken.jp:5000/groups?host=081505_L1_bd5.hdfgroup.org"}, {"rel": "typebase", "href": "http://ssbd1.qbic.riken.jp:5000/datatypes?host=081505_L1_bd5.hdfgroup.org"}, {"rel": "root", "href": "http://ssbd1.qbic.riken.jp:5000/groups/743e32b6-cddf-11e7-a24c-6cae8b60852a?host=081505_L1_bd5.hdfgroup.org"}], "root": "743e32b6-cddf-11e7-a24c-6cae8b60852a", "created": "2017-11-20T10:42:07Z", "lastModified": "2017-11-20T10:42:07Z"}

I created the hserv from docker. Here is the details:

$ curl -X GET http://ssbd1.qbic.riken.jp:5000/info {"h5serv_version": "0.2", "name": "h5serv", "hdf5_version": "1.8.16", "about": "h5serv is a webservice for HDF5 data", "hdf5-json-version": "1.0.0", "h5py_version": "2.6.0", "documentation": "http://h5serv.readthedocs.org", "greeting": "Welcome to h5serv!"}

And from the server/config.py, I don't have 'allow_noauth': True # Allow anonymous requests (i.e. without auth header)

But it has 'new_domain_policy': 'ANON' # Ability to create domains (files) on serv: ANON - anonymous users ok, AUTH - only authenticated, NEVER - never allow

I don't think it has anything to do with file location on the server because it works if I uncomment the line 114 headers['host'] = domain in httpconn.py.

There seems to be some confusion (maybe on my part of using hserv or h5pyd?) on the domain settings with the latest code.

DrKenHo commented 6 years ago

Hi John,

I have rebuilt everything, now it seems to work ok.

I didn't use the docker container under https://github.com/HDFGroup/h5serv I also failed to get the h5serv container to run correctly using the Dockerfile under the source tree of h5serv.

I managed to build my own Dockerfile from a Ubuntu base and install h5serv as directed in the installation.

In addition I have used

pip install h5py==2.8.0rc1

for both the h5serv as well as h5pyd docker containers. The version before 2.8.0rc1 gave wrong answers for some HDF5 files.

I think there may well be to do with some mismatch of libraries etc causing the issues.

Thanks for all the help. Ken