HDFGroup / h5serv

Reference service implementation of the HDF5 REST API
Other
168 stars 35 forks source link

How does the acl/password authentication work? #97

Closed ahalota closed 8 years ago

ahalota commented 8 years ago

I'm having trouble understanding what is described in the documentation for authentication. My goal is to have update/post blocked off for all users, except one special user.

I used setacl.py and can succesfully update the acl for the default (userid 0), which then blocks any anonymous requests from making updates. No problem there, and I was also able revert it back to allowing everyone.

I can also create special settings for any other user, so long as the ID is a number, not a name, using setacl.py. If I use it on a name an error occurs, since in set_acl.py there is a line "int(arg)" which tries to parse the input command line argument into a number.

I used update_pwd.py to create a new user with password. It works fine.

I don't understand how or whether these two items are related. My username has characters, but the acl only works with numbers. Are they referring to the same users, or are these two different authentication methods and I should pick one of the two??

I see the actltest.py file and I think I can follow along this way to set up the acl settings for the user I made in update_pwd.py, but what is to prevent an unauthorized user from doing this same thing and just creating a user for themselves that has access?

1) Is it possible to use setacl.py exclusively to set up my users and their access abilities? It should block all unregistered or unknown users from from creating/updating anything, and allow my named user to make edits.

2) How do I ensure an unknown user does not follow the steps in acltest.py to log into my database, create a new user, and make edits?

jreadey commented 8 years ago

To address you second question first, the tools in h5serv/util/admin (add_user.py, makepwd_file.py, update_pwd.py, setacl.py, import_file.py) are designed to run on the host running h5serv (i.e. they operate on files directly on don't go through the REST API). To keep user accounts, and ACL's secure, it is necessary to have system security for the machine it's running on and the data files exposed by h5serv. I.e. if it is possible for someone to ssh into the machine and acquire write access to these files, that person would be able to subvert any security measures enforced by the REST API.

So in setting up h5serv (esp if security is an important issue), you'll want to think about what user the h5serv process runs under, who would have access to the machine, and would they be able to acquire read and/or write access to the h5serv data files (and password file).

Next for your first question, let's take the (simple case) of one file that I want one user to have full access to and everyone else to have read access to.

As an example, I'll ssh into my machine running h5serv, go to the util/admin directory and run update_pwd.py:

$ python update_pwd.py
>filename: passwd.h5
>username: None
>password: None
>email: None
username                 userid  state   email                                   ctime               mtime
------------------------------------------------------------------------------------------------------------------------
test_user1               1       b'A'    b'None'                                 2016-01-17 15:57:16 2016-01-17 15:57:16
test_user2               2       b'A'    b'None'                                 2016-01-17 15:57:16 2016-01-17 15:57:16

So it looks like I have two users: test_user1 and test_user2 (note: these don't have anything to do with user accounts on the system). If I didn't have any users, I could add one with the add_user.py command.

So let's use test_user1 as the "power user" who will be the sole party with read/write access to our file. The username of "test_user1" is what is used to reference the user when making http requests through the REST API. The userid is a numerical value that is assigning in a sequential way as users are added. Internally h5serv uses the userid to denote users in the ACL, but these userids are translated to usernames when an ACL is read/updated through the REST API.

Let's use tall.h5 as our test file. From the h5serv/util/admin directory, the relative path would be: ../../data/test/tall.h5 and the DNS path (for REST API requests) would be tall.test.hdfgroup.org.

Since we haven't modified any ACLs, running getacl.py:

$ python getacl.py -file ../../data/test/tall.h5
no ACLs

Reports no ACLs, and therefore anyone, can do anything (read/modify/delete) with the file. That maybe ok if only trusted parties have access to the h5serv endpoint, but not so good if you are planning to expose the service endpoint on the internet.

Next, let's give test_user1 permission to do anything to the file (it seems redundant at this point, but bear with me...):

$ python set_acl.py -file ../../data/test/tall.h5 +crudep 1

This says give user 1 the permissions of:

Now run get_acl.py again:

$ python getacl.py -file ../../data/test/tall.h5
  userid     create      read    update    delete   readACL  updateACL
       1        Y         Y         Y         Y         Y         Y

Now we see there's one ACL entry with all permissions enabled. At this point any request that come to the service that are authenticated as test_user1 we'll use these permissions, while any other authenticated user (and any requests that are anonymous), will use the default permissions (which happen to be the same for now).

Next we'll add a default entry (effective for any userid otherwise not listed in the ACL) and allow only read access. By not specifying any userid's this becomes an update to the default permission. Flags following the '+' give permission, and flags following '-' remove permission:

$ python setacl.py -file ../../data/test/tall.h5 +r-cudep
 userid     create      read    update    delete   readACL  updateACL
       0        N         Y         N         N         N         N

And now we have two entries:

 python getacl.py -file ../../data/test/tall.h5
  userid     create      read    update    delete   readACL  updateACL
       0        N         Y         N         N         N         N
       1        Y         Y         Y         Y         Y         Y

Now REST API requests from authenticated userid 1 (i.e. username "test_user1") have carte-blanche to do anything, while everyone else will only have read access.

Let's try it out using some curl commands. I'll run these from the same host, but in principle they would work the same when invoked from anywhere on the internet (replacing 127.0.0.1 with the external IP of the host).

Unauthenticated user tries to read tall:

$ curl  --header "Host: tall.test.hdfgroup.org" http://127.0.0.1:5000
{"root": "1fbbb002-2cea-11e6-accf-3c15c2da029e", "lastModified": "2016-06-07T19:59:39Z", "hrefs": [{"href": "http://tall.test.hdfgroup.org/", "rel": "self"}, {"href": "http://tall.test.hdfgroup.org/datasets", "rel": "database"}, {"href": "http://tall.test.hdfgroup.org/groups", "rel": "groupbase"}, {"href": "http://tall.test.hdfgroup.org/datatypes", "rel": "typebase"}, {"href": "http://tall.test.hdfgroup.org/groups/1fbbb002-2cea-11e6-accf-3c15c2da029e", "rel": "root"}], "created": "2016-06-07T19:59:39Z"}

That went through ok, since the user was only attempting to read.

However if we try to get the ACLs it fails with a 401:

curl  --header "Host: tall.test.hdfgroup.org" http://127.0.0.1:5000/acls
Traceback (most recent call last):
  File "/Users/jreadey/anaconda/envs/py34/lib/python3.4/site-packages/tornado/web.py", line 1413, in _execute
    result = method(*self.path_args, **self.path_kwargs)
  File "app.py", line 804, in get
    self.verifyAcl(current_user_acl, 'readACL')  # throws exception is unauthorized
  File "app.py", line 148, in verifyAcl
    raise HTTPError(401, "Unauthorized")
tornado.web.HTTPError: HTTP 401: Unauthorized (Unauthorized)

Or trying to delete the file fails as well:

$ curl -X DELETE --header "Host: tall.test.hdfgroup.org" http://127.0.0.1:5000
Traceback (most recent call last):
  File "/Users/jreadey/anaconda/envs/py34/lib/python3.4/site-packages/tornado/web.py", line 1413, in _execute
    result = method(*self.path_args, **self.path_kwargs)
  File "app.py", line 3054, in delete
    self.verifyAcl(acl, 'delete')  # throws exception is unauthorized
  File "app.py", line 148, in verifyAcl
    raise HTTPError(401, "Unauthorized")
tornado.web.HTTPError: HTTP 401: Unauthorized (Unauthorized)

That was a lot to cover and we haven't even gotten into setting ACLs on datasets or groups within a file, or setting permissions for a set of files. But let me stop here for now and see if you have questions.

Also, this blog posts covers various issues related to security: https://hdfgroup.org/wp/2015/12/serve-protect-web-security-hdf5/.

ahalota commented 8 years ago

Great, that worked out perfect.

Only thing I was wondering is how does the password.h5 file link to the one referenced in server/config.py ? My config.py has 'password_file': '../util/admin/passwd.h5' . If I have the password file elsewhere, I assume it won't find it and would act as if there was no password file?

I had to add the -f flag when using update_pwd.py because the default location it searches for is '../server/passwd.h5', which isn't an existing folder. It might make sense to have it either load from config, or at least have the default match the default that's included in config.py

My files are called getacl.py and setacl.py, I think you accidentally referred to them as get_acl.py and set_acl.py in your instructions.

ahalota commented 8 years ago

I had to add the -f flag when using update_pwd.py because the default location it searches for is '../server/passwd.h5', which isn't an existing folder. It might make sense to have it either load from config, or at least have the default match the default that's included in config.py

I checked a different install I had, which did exactly this. I must not be updating my copy correctly with your latest revision.

jreadey commented 8 years ago

Are you on the develop branch in both locations? Of course you can just do a git pull to get the latest changes.

From a maintenance point of view the util scripts are a bit problematic since they are not covered by the test scripts. Please file issues for any problems you come across.

jreadey commented 8 years ago

I'm closing this issue - please re-open if you need any clarifications on ACL usage.

ghost commented 7 years ago

Am able to set ACL on a specific domain. How can I prevent creation of new domains / allow only 1 user to create these domains?

jreadey commented 7 years ago

There's nothing preventing users from creating new domains. While not as bad as allowing data to be deleted or overwritten, it would be best to patch this up.

Have you come across the .toc.h5 file in the h5serv/data directory? It maintains a list of all domains (using HDF5 external links). The logic for the create domain check could go like this:

Get the user's ACL from the .toc.h5 file. if the user has "create" permission, allow the creation of the new domain, otherwise fail.

So any users who have had an ACL added (with 'create' perms) to the .toc file would be able to create new domains, otherwise not.

How does that sound?

Could you create a new issue to track this?

ghost commented 7 years ago

Great suggestion! I just filed the issue at https://github.com/HDFGroup/h5serv/issues/105

jreadey commented 7 years ago

Thanks. I'll plan to work on it this month.