GrahamDumpleton / mod_wsgi

Source code for Apache/mod_wsgi.
Apache License 2.0
1.02k stars 268 forks source link

SIGSEGV raising when passing invalid utf-8 sequence to Authentication header with Python 3 #898

Closed jun66j5 closed 1 month ago

jun66j5 commented 3 months ago

When passing invalid utf-8 sequence to Authentication header with AuthBasicProvider wsgi and Python 3, SIGSEGV occurs in the apache child process.

[Mon Jul 22 18:35:36.799108 2024] [core:notice] [pid 3091685:tid 140098776566848] AH00052: child pid 3091689 exit signal Segmentation fault (11)

In wsgi_check_password(), the user and password are retrieved from remote client, however the returned value from Py_BuildValue("(Oss)", ...) is not checked.

diff --git a/src/server/mod_wsgi.c b/src/server/mod_wsgi.c
index 9bc07c672..50805e118 100644
--- a/src/server/mod_wsgi.c
+++ b/src/server/mod_wsgi.c
@@ -14889,8 +14889,13 @@ static authn_status wsgi_check_password(request_rec *r, const char *user,

                 Py_INCREF(object);
                 args = Py_BuildValue("(Oss)", vars, user, password);
-                result = PyObject_CallObject(object, args);
-                Py_DECREF(args);
+                if (args == NULL) {
+                    result = NULL;
+                }
+                else {
+                    result = PyObject_CallObject(object, args);
+                    Py_DECREF(args);
+                }
                 Py_DECREF(object);
                 Py_DECREF(vars);

After applying the above patch temporarily, the following errors are logged instead of SIGSEGV.

[Mon Jul 22 19:05:25.829812 2024] [wsgi:error] [pid 3093853:tid 140035076626176] [client 192.168.11.100:53020] mod_wsgi (pid=3093853): Exception occurred processing WSGI script '/dev/shm/modwsgi-ZGpDDX/auth.wsgi'.
[Mon Jul 22 19:05:25.833381 2024] [wsgi:error] [pid 3093853:tid 140035076626176] [client 192.168.11.100:53020] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

The same issue might exist at the following locations:

$ git grep 'Py_BuildValue.*"[^"]*s[^"]*"'
src/server/mod_wsgi.c:            args = Py_BuildValue("(s)", resource);
src/server/mod_wsgi.c:                args = Py_BuildValue("(Oss)", vars, user, password);
src/server/mod_wsgi.c:                args = Py_BuildValue("(Oss)", vars, user, realm);
src/server/mod_wsgi.c:                args = Py_BuildValue("(Os)", vars, r->user);
src/server/mod_wsgi.c:                args = Py_BuildValue("(Oss)", vars, r->user, password);
src/server/wsgi_logger.c:    args = Py_BuildValue("(OssOOO)", buffer, "utf-8", "replace",
src/server/wsgi_metrics.c:        args = Py_BuildValue("(s)", name);
jun66j5 commented 2 months ago

It is easy to reproduce the apache crashing by the fault in mod-wsgi module.

$ curl -H 'Authorization: Basic XXXX' http://hostname/path/to/auth
GrahamDumpleton commented 2 months ago

I am having to do a release of some mod_wsgi stuff today for other reasons so am going to try and get this added.

GrahamDumpleton commented 2 months ago

Your suggested change is not necessarily a solution and things are a bit more complicated.

This wasn't an issue in Python 2.X since the user/password would be converted as byte strings, thus effectively ISO-8859-1.

In Python 3.X where strings are now unicode strings, it is being decoded using the default codec, usually utf-8.

Now technically, they will be ISO-8859-1, but, they may require a different encoding if the application said it was allowed., eg.,

WWW-Authenticate: Basic realm="myChosenRealm", charset="UTF-8"

It is too hard to know what encoding should be used though.

The best that can be done is to decode the string as ISO-8859-1 and pass that through as a unicode string. If the application knew that it would be utf-8 or some other encoding, the check_password() function will need to do a byte code dance of:

user.encode("ISO-8859-1").decode("UTF-8")

to convert it before use.

As to other uses beyond user/password, I suspect realm can only be ISO-8859-1. Either way it should be handled the same.

As to r->user I am not sure as forget for now where that one comes from and under what circumstances so I need to check.

As Py_BuildValue() does not have a way to handle no utf-8 at directly, will need to convert as iso-8895-1 first and then supply it to it.

jun66j5 commented 2 months ago

+1 for using ISO-8859-1.

I consider that mod-wsgi should use same encoding in both user parameter of check_password and environ['REMOTE_USER'] in order to be the same value (as WSGI "bytes-as-unicode" string, ISO-8859-1).

GrahamDumpleton commented 2 months ago

So in that case, needs to be:

                PyObject *user_string = NULL;
                PyObject *password_string = NULL;

#if PY_MAJOR_VERSION >= 3
                user_string = PyUnicode_DecodeLatin1(user, strlen(user), NULL);
                password_string = PyUnicode_DecodeLatin1(password, strlen(password), NULL);
#else
                user_string = PyString_FromString(user);
                password_string = PyString_FromString(password);
#endif

                vars = Auth_environ(adapter, group);

                Py_INCREF(object);
                args = Py_BuildValue("(OOO)", vars, user_string, password_string);
                result = PyObject_CallObject(object, args);
                Py_DECREF(args);
                Py_DECREF(object);
                Py_DECREF(vars);
                Py_DECREF(user_string);
                Py_DECREF(password_string);

This means get error:

% curl -H 'Authorization: Basic XXXX' http://localhost:8000
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Unauthorized</title>
</head><body>
<h1>Unauthorized</h1>
<p>This server could not verify that you
are authorized to access the document
requested.  Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
</body></html>

where simple check_password() implementation of:

def check_password(environ, user, password):
    print('USER', user, environ['REQUEST_URI'])
    if user == 'spy':
        if password == 'secret':
            return True
        return False
    elif user == 'witness':
        if password == 'secret':
            return 'protected'
        return False
    return None

results in logger error of:

[Thu Aug 08 19:23:45.672404 2024] [wsgi:error] [pid 64155:tid 123145535889408] USER ]u\xc3\x97 /
[Thu Aug 08 19:23:45.672576 2024] [auth_basic:error] [pid 64155:tid 123145535889408] [client ::1:58867] AH01618: user ]u\xd7 not found: /

So is garbage data, but doesn't crash at least. :-)

GrahamDumpleton commented 2 months ago

If you want to try with:

pip install https://github.com/GrahamDumpleton/mod_wsgi/archive/refs/heads/develop.zip

you can.

Note quite there for a release yet as need to evaluate a few other changes from backlog first including setting up build for Python 3.13, which I believe will require some code changes because of removal of deprecated APIs.

Today turned into a mess as I lost internet connection for a few hours so lucky to even get done what I did. :-)

jun66j5 commented 2 months ago

I just tried develop branch with Python 3.10. Confirmed the issue goes away. Thanks.

GrahamDumpleton commented 1 month ago

Closing this issue at this point. Should get out a new released version of mod_wsgi in coming week after see if anything else need to roll into a new release.