URL authorization support

GoogleCodeExporter commented 9 years ago

I would very like mod_wsgi support for Authorization similar to
mod_python's  PythonAuthzHandler. 

The aim is to be able to filter access to URLs by using a Python script
with custom logic, implementing a sort of ACLs for web-based resources. My
use-case would be a WebDAV-enabled SVN server, where I could allow/deny
access to resources for my team mates according to our custom access config
file (not using SVN's authz).

Original issue reported on code.google.com by esizi...@gmail.com on 25 Dec 2007 at 4:51

GoogleCodeExporter commented 9 years ago

This already exists in 2.0 release candidates. See:

  http://code.google.com/p/modwsgi/wiki/ChangesInVersion0200

The WSGIAuthUserScript is similar to mod_python authenhandler except that you 
only need to validate password.

The WSGIAuthGroupScript is similar to mod_python authzhandler except that you 
only need to indicate the 
groups the user is in.

Original comment by Graham.Dumpleton@gmail.com on 25 Dec 2007 at 10:04

GoogleCodeExporter commented 9 years ago

Here is example Apache configuration for Dav access control.

######

Directory "/Users/grahamd/Sites/uploads/">
Dav On

AuthType Basic
AuthName "Uploads"
AuthBasicProvider wsgi
WSGIAuthUserScript /Users/grahamd/Sites/auth-uploads.wsgi
WSGIAuthGroupScript /Users/grahamd/Sites/auth-uploads.wsgi

Require valid-user

<Limit GET HEAD OPTIONS CONNECT POST>
Require group read
</Limit>

<Limit GET HEAD OPTIONS CONNECT POST PROPFIND PUT DELETE \
 PROPPATCH MKCOL COPY MOVE LOCK UNLOCK>
Require group write
</Limit>

######

The auth-uploads.wsgi file is:

######

PASSWORDS = {
  'user1': 'password1',
  'user2': 'password2',
}

GROUPS = {
  'user1': ['read'],
  'user2': ['read', 'write'],
}

def check_password(environ, user, password):
    print >> environ['wsgi.errors'], 'check_password ', user
    print >> environ['wsgi.errors'], 'REQUEST_METHOD ', environ['REQUEST_METHOD']
    print >> environ['wsgi.errors'], 'REQUEST_URI ', environ['REQUEST_URI']
    if user in PASSWORDS:
        return PASSWORDS[user] == password
    return None

def groups_for_user(environ, user):
    print >> environ['wsgi.errors'], 'groups_for_user ', user
    return GROUPS.get(user, [])

######

Example shows that request method and URI are accessible in auth hook 
functions. You can use this to work 
out specific URLs used when dav tries to access certain resources and use a 
Location directive for those URLs 
in Apache configuration to further limit access base on require group directive.

Original comment by Graham.Dumpleton@gmail.com on 26 Dec 2007 at 2:45

Added labels: Type-Other
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

Thank you, I'll try this right now.

Original comment by esizi...@gmail.com on 26 Dec 2007 at 6:10

GoogleCodeExporter commented 9 years ago

I would like to have a possibility to return HTTP errors like HTTP 500, HTTP
FORBIDDEN, etc. Using this suggested method I can only return empty group list 
for a
user with access denied. This will lead to browser's additional requests for 
user
name and password without pointing the error.

Original comment by esizi...@gmail.com on 26 Dec 2007 at 7:37

GoogleCodeExporter commented 9 years ago

If you want to force a 500 error response, ie., indicate that an internal 
server error has occurred even though one hasn't occurred, you can force it by 
raising an exception and not catching it. To want 
to do that seems to be the wrong way of going about things though.

As to replacing 401 with something else, I can't see that you can do that as it 
will break how HTTP Basic/Digest authentication mechanisms work. Ie., browser 
clients rely on being returned a 401 
status with WWW-Authenticate header so as to know that authentication is 
required in the first place. If you return a different status, you would never 
be prompted to enter your credentials if an 
interactive browser.

If you still want to investigate being able to return a different status rather 
than the 401, you may be able to use the ErrorDocument directive. This 
directive will need to refer to alternate page with full 
URL including host:

  ErrorDocument 401 http://myhost.com/subscription_info.html

At least this is what the Apache documentation says:

  http://httpd.apache.org/docs/2.2/mod/core.html#errordocument

In trying it though I can't seem to get it to work though. Although one can set 
ErrorDocument for 401 to use local server document, can't get it to return a 
302 returning to a full URL instead. It is 
quite possible that this feature was in older version of Apache but they took 
it out and documentation was never updated.

Searching some more, the ability to do that has indeed been removed.

    if (error_number == 401 && what == REMOTE_PATH) {
        ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, cmd->server,
                     "cannot use a full URL in a 401 ErrorDocument "
                     "directive --- ignoring!");
    }

So it produces in error log:

[Wed Dec 26 20:13:45 2007] [notice] cannot use a full URL in a 401 
ErrorDocument directive --- ignoring!

In short, I believe you will be breaking HTTP authentication, ie., doing 
something contrary to the RFC standards for this stuff, by changing the error 
status.

The best one can manage as far as returning HTTP_FORBIDDEN is by denying access 
even before authentication is being done. For example based on client host from 
which request came from. This 
can  be done using script associated with WSGIAccessScript directive.

Why do you want to do what you want? Can you point to information on the 
Internet that indicates that what you want to do is acceptable practice and 
doesn't break RFCs as to how HTTP Basic/Digest 
authentication is meant to work?

Original comment by Graham.Dumpleton@gmail.com on 26 Dec 2007 at 9:25

GoogleCodeExporter commented 9 years ago

Actually, the Apache documentation is correct when you read it properly.

"""In addition, if you use a remote URL in an ErrorDocument 401, the client 
will not know to prompt the user for 
a password since it will not receive the 401 status code. Therefore, if you use 
an ErrorDocument 401 directive 
then it must refer to a local document."""

This substantiates what I said that if must be a 401 status else client will 
not know that credentials required.

Original comment by Graham.Dumpleton@gmail.com on 26 Dec 2007 at 9:52

GoogleCodeExporter commented 9 years ago

In my use case I'm not care about authentication in my Python script, as it's 
handled
by other Apache module (mod_ldap).

I'm only going to implement a next-pass authorization functionality, when user 
which
is already being authenticated should has been checked for access to resources 
by
some ACLs. That's what I want to implement as a Python script. Is that clear?

Original comment by esizi...@gmail.com on 26 Dec 2007 at 9:52

GoogleCodeExporter commented 9 years ago

Even so, not returning HTTP_UNAUTHORIZED for authorization phase when checking 
ACLs is going against 
what is normal practice. If you return HTTP_FORBIDDEN at that point, because of 
how Apache treats 
authentication/authorisation as being connected you would be denying the client 
the ability to reauthenticate 
with different credentials if they made a mistake and used credentials without 
the correct access permissions 
for the resource concerned. Not returning HTTP_UNAUTHORIZED will also possibly 
break Apache's ability to 
check authorisation against multiple mechanisms where acceptance by any is 
okay. In other words it possibly 
mucks up the concept behind the Apache Satisfy directive.

If you want to do something unconventional, your only choice is probably going 
to be to use mod_python 
instead and implement a full authorization handler which does all its own 
parsing of requires directives etc, 
and which can then override what sort of error status is returned.

Original comment by Graham.Dumpleton@gmail.com on 26 Dec 2007 at 10:14

GoogleCodeExporter commented 9 years ago

Yes, but Authorization is not the same as Authentication. It's more than common 
when
you have several authorization plugins, but for a person to have an access he 
must
have been successfully authorizen by all the authorizers, not [only] one of 
them.

I think that's the reason for splitting AuthenHandler (Apache's
ap_hook_check_user_id()) and AuthzHandler (Apache's ap_hook_auth_checker()) in 
(both)
Apache HTTPD and mod_python.

The main idea of this issue is to discuss this concept for mod_wsgi. I do think 
that
mod_wsgi should support both this concept directly, and not only the 
Authentication
alone is it does for now.

P.S. There is also AccessHandler (Apache's ap_hook_access_checker()) which 
supposed
to filter requests based on client's IP or something.

P.P.S. For now, how am I supposed to pass a configure-time (from Apache's config
file) parameters to WSGIAuthGroupScript and WSGIAuthUserScript?

It seems that mod_wsgi ignores SetEnv and mod_rewrite's RewriteRule , - [E=...]
options, at least I can't find them in `environ' dict.

Original comment by esizi...@gmail.com on 27 Dec 2007 at 1:53

GoogleCodeExporter commented 9 years ago

I am well aware of the difference between the access, authentication and 
authorisation phases of Apache.

The WSGIAccessScript directive implements host based access checks as part of 
the Apache access phase.

The WSGIAuthUserScript directive implements checks as part of the 
authentication phase.

The WSGIAuthGroupScript directive implements checks as part of the 
authorisation phase.

Documentation is at:

  http://code.google.com/p/modwsgi/wiki/AccessControlMechanisms

For all three the status returned is the same status used by all other Apache 
modules included with Apache 
that are implemented in those phases. No supplied Apache module which hooks 
into the authorisation phase 
returns HTTP_FORBIDDEN as as I have said before you break how HTTP 
'Authorization' header is meant to be 
used by doing so. I am therefore following what Apache itself does and which is 
sufficient for practically all 
cases.

As I said before, you may simply be better off using mod_python. If you go try 
it and come back to me with a 
working pair of authentication/authorisation handlers for mod_python that show 
that returning 
HTTP_FORBIDDEN actually works in practice and doesn't cause problems with users 
not being able to resupply 
credentials without needing to exit there client browser and start over, then 
I'll look at whether being able to 
supply a different status is warranted or not. At the moment this discussion 
all seems to be theoretical and no 
way to know if what you want to do will even work in practice.

The reason that SetEnv parameters are not passed is that they are not 
associated with a request by Apache 
until very late in the fixup handler phase when it is known for certain which 
response handler will actually be 
called for the URL. This is well after the Apache AAA phases. Technically 
various things could happen in 
between the AAA phases and the fixup handler which change the actual target of 
the request and as a result 
this can change what set of SetEnv directives could be applied. It is therefore 
not safe to assume that the 
target will not change and look ahead somehow and grab SetEnv directives, 
something which can't be done 
anyway as they are held in inaccessible data structures until the fixup handler 
populates the request structure 
with them.

The only reliable way of handling the issue is therefore for the script itself 
to be self contained and describe 
its own configuration. This could be static, or it could be dynamically 
determined from what request headers 
are provided in environ. If you have various URL subsets that require different 
static configurations, then 
factor out your core code into a proper Python module somewhere. Then create a 
separate script for each 
context which imports the common code and uses a wrapper around it to set the 
configuration.

  import commonstuff

  config = { .... }

  def check_password(environ, user, password):
    environ.update(config)
    return commonstuff.check_password(environ, user, password)

I have no desire to mirror mod_python's parallel mechanism for defining 
configuration as it overly complicates 
the internal code and is not necessary.

For your own interest about how Apache works internally, you may want to read 
the following if you haven't 
already.

  http://www.fmc-modeling.org/category/projects/apache/amp/Apache_Modeling_Project.html

Original comment by Graham.Dumpleton@gmail.com on 27 Dec 2007 at 9:41

GoogleCodeExporter commented 9 years ago

I'm used to have an AD-based authentication for my web resources using Apaches's
mod_ldap module.

This module could be a good example of AAA modules which do returns HTTP results
other than HTTP_OK and HTTP_FORBIDDEN.

So, I would like you comment mod_ldap source file
httpd-2.2.6/modules/aaa/mod_authnz_ldap.c starting from line 410:

    /* handle bind failure */
    if (result != LDAP_SUCCESS) {
        ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
                      "[%" APR_PID_T_FMT "] auth_ldap authenticate: "
                      "user %s authentication failed; URI %s [%s][%s]",
                      getpid(), user, r->uri, ldc->reason, ldap_err2string(result));

        return (LDAP_NO_SUCH_OBJECT == result) ? AUTH_USER_NOT_FOUND
#ifdef LDAP_SECURITY_ERROR
                 : (LDAP_SECURITY_ERROR(result)) ? AUTH_DENIED
#else
                 : (LDAP_INAPPROPRIATE_AUTH == result) ? AUTH_DENIED
                 : (LDAP_INVALID_CREDENTIALS == result) ? AUTH_DENIED
#ifdef LDAP_INSUFFICIENT_ACCESS
                 : (LDAP_INSUFFICIENT_ACCESS == result) ? AUTH_DENIED
#endif
#ifdef LDAP_INSUFFICIENT_RIGHTS
                 : (LDAP_INSUFFICIENT_RIGHTS == result) ? AUTH_DENIED
#endif
#endif
                 : AUTH_GENERAL_ERROR;
    }

Original comment by esizi...@gmail.com on 28 Dec 2007 at 11:55

GoogleCodeExporter commented 9 years ago

This is an example run of svn update without credentials supplied:
$ svn up
Authentication realm: <http://localhost:80> Authorized Area
Password for 'esizikov': 
Authentication realm: <http://localhost:80> Authorized Area
Username: 
Password for '': 
svn: PROPFIND request failed on '/InventoryAgent'
svn: PROPFIND of '/InventoryAgent': 500 Internal Server Error (http://localhost)

This is an extract from the Apache's error_log:
[Fri Dec 28 17:43:48 2007] [warn] [client 127.0.0.1] [28580] auth_ldap 
authenticate:
user esizikov authentication failed; URI /InventoryAgent [ldap_simple_bind_s() 
to
check user credentials failed][Invalid credentials]
[Fri Dec 28 17:43:49 2007] [warn] [client 127.0.0.1] [28581] auth_ldap 
authenticate:
user  authentication failed; URI /InventoryAgent [User not found][No such 
object]

Original comment by esizi...@gmail.com on 28 Dec 2007 at 12:40

GoogleCodeExporter commented 9 years ago

The mod_ldap source you quote is not a standard Apache handler, it is an auth 
provider and they do not 
return HTTP status codes. They return one of:

typedef enum {
    AUTH_DENIED,
    AUTH_GRANTED,
    AUTH_USER_FOUND,
    AUTH_USER_NOT_FOUND,
    AUTH_GENERAL_ERROR
} authn_status;

The wrapper around the mod_wsgi WSGIAuthUserScript code uses the same result 
codes.

               if (result) {
                    if (result == Py_None) {
                        status = AUTH_USER_NOT_FOUND;
                    }
                    else if (result == Py_True) {
                        status = AUTH_GRANTED;
                    }
                    else if (result == Py_False) {
                        status = AUTH_DENIED;
                    }
                    else {
                        PyErr_SetString(PyExc_TypeError, "Basic auth "
                                        "provider must return True, False "
                                        "or None");
                    }

                    Py_DECREF(result);
                }

There is no way in an auth provider of affecting what the HTTP status code is 
that is returned by the 
authentication handler phase, ie., check_user_id(). You really need to look at 
mod_auth_basic.c and mod_auth_digest.c to see how auth provider framework 
called and what HTTP status codes are returned.

You also probably want to look at mod_authz_dbm.c as that is closest to what 
mod_wsgi group authorisation 
is doing except that group list coming from Python code rather than dbm file. 
You will not that 
mod_authn_dbm does not return HTTP_FORBIDDEN or provide a means of doing so.

Since that is exactly what you want to do, you might also ask on some Apache 
users forum why 
mod_authn_dbm doesn't allow HTTP_FORBIDDEN to be returned. If they come back 
and say it could but simply 
doesn't, then I might also listen, but if they say that it doesn't make sense 
like I am saying .... :-)

Original comment by Graham.Dumpleton@gmail.com on 28 Dec 2007 at 10:28

GoogleCodeExporter commented 9 years ago

Closing this issue. How it works is how it was intended and is inline with how 
other
Apache modules do things.

Original comment by Graham.Dumpleton@gmail.com on 15 Jan 2008 at 4:23

Changed state: Invalid

GoogleCodeExporter commented 9 years ago

Sorry to bring out this (almost) 3 year-old issue, but I think it is still 
relevant and should be implemented. I've read and understood all of the above, 
but am in favor of re-opening it for similar arguments to `esizi...@gmail.com`. 
I've  mixed my comments between some partial replies of yours.

> Example shows that request method and URI are accessible in auth hook 
functions. You can use this to work out specific URLs used when dav tries to 
access certain resources and use a Location directive for those URLs in Apache 
configuration to further limit access base on require group directive.

If access is granted on a per-user basis, how should I proceed? Even if I 
created a user for each group, how do I come about specifiying such a directive 
in the Apache config file?

> As to replacing 401 with something else, I can't see that you can do that as 
it will break how HTTP Basic/Digest authentication mechanisms work.

Yes, this is precisely what I want. I'm not using HTTP basic/digest 
authentication. I'm authenticating my users through OpenID. Returning a 401 
status and have the browser display a box for authentication would be confusing 
to say the least, since they don't have a password in the first place. 
Moreover, I couldn't make use of the username and password even if the user 
provided one.

> If you want to do something unconventional, your only choice is probably 
going to be to use mod_python instead and implement a full authorization 
handler which does all its own parsing of requires directives etc, and which 
can then override what sort of error status is returned.

I'm already using mod_python for this task. However, the project is dead. 
You've addressed this issue yourself 
(http://blog.dscpl.com.au/2010/06/modpython-project-is-now-officially.html). 
The handler works perfectly fine as it is (even addressing the next point) and 
I'd like to preserve that behavior with mod_wsgi. All of my setup uses 
mod_wsgi, except for this specific handler, and maintaining mod_python code 
just because you don't support custom authorization handlers is inconvenient.

> As I said before, you may simply be better off using mod_python. If you go 
try it and come back to me with a working pair of authentication/authorisation 
handlers for mod_python that show that returning HTTP_FORBIDDEN actually works 
in practice and doesn't cause problems with users not being able to resupply 
credentials without needing to exit there client browser and start over, then 
I'll look at whether being able to supply a different status is warranted or 
not.

I don't have a "pair of authentication/authorisation handlers", I only have an 
authorization handler and it works great. I'm using Django with OpenID 
authentication, so I can validate the user's permission with only access to the 
request cookies. Returning HTTP_FORBIDDEN works great since I don't use HTTP 
basic/digest authentication. The user may sign-on at any time, even after 
HTTP_FORBIDDEN was returned. At least, I should be able to redirect to an 
authentication page. In fact, in this case, I could even get away with retuning 
HTTP_NOT_FOUND.

In any case, the resources I am trying to protect are not directly accessed by 
the user, but by some other embedded object in the page (I'm protecting access 
to videos played through a Flash player, similar to YouTube's). The page in 
question already requires login to be displayed, and the video itself should 
never be accessible without seing that page. I'd like (for copyright reasons) 
to be able to prevent the referenced content from being accessed directly (the 
user should have purchased the content to view it). Serving video content 
directly through Django using the static views is not recommended. 

> You really need to look at mod_auth_basic.c and mod_auth_digest.c to see how 
auth provider framework called and what HTTP status codes are returned.

Suppose I export access to my content through the OAuth protocol, it is likely 
that third-party service providers will access the content using only an access 
token. No user is at the other end to supply credentials and the service 
provider does not have access to those credentials, as is part of the OAuth 
design. Following what authentication modules do in an attempt to implement 
authorization seems inadequate.

If I understand RFC2617 and Apache's documentation correctly, returning 401 
only makes sense when access to the requested resource is protected by HTTP 
basic/digest authentication. Any authentication/authorization beyond this point 
does not seem to be covered by RFC2617. I really need to be able to implement 
*only* an authorization handler, and return HTTP_FORBIDDEN or HTTP_NOT_FOUND at 
my discretion. I understand that Apache does not seem to be equiped with 
facilities for authorization without using HTTP authentication (i.e. you need 
to specify "AuthType Basic" to be able to perform authorization). However, 
RFC2617 is now 11 years old and HTTP authentication is showing it's age. Better 
authentication methods have been developped since, and OpenID is one of them. I 
believe implementing this feature allows proper use of modern protocols and 
would significantly benefit mod_wsgi.

Original comment by andre.l....@gmail.com on 19 Oct 2010 at 2:04

GoogleCodeExporter commented 9 years ago

Please take any further discussion about this to the mod_wsgi mailing list as 
an issue tracker is not really the appropriate forum for it.

Original comment by Graham.Dumpleton@gmail.com on 19 Oct 2010 at 2:10

GoogleCodeExporter commented 9 years ago

Latest discussion about this can be found at:

http://groups.google.com/group/modwsgi/browse_frm/thread/d5959f6be23b481e

Original comment by Graham.Dumpleton@gmail.com on 19 Oct 2010 at 11:30

Copterfly / modwsgi

URL authorization support #48