fgallaire / wsgiserver

NEW HOME ON GITLAB
https://gitlab.com/fgallaire/wsgiserver
27 stars 5 forks source link

WSGI applications are required to return bytes. #2

Closed GrahamDumpleton closed 8 years ago

GrahamDumpleton commented 8 years ago

This code looks suspect:

A WSGI application iterable is supposed to return byte strings for response body. If it returns Unicode strings it is an error.

A WSGI server is not meant to convert Unicode strings to byte strings automatically.

Thus concerned about what the code:

            for chunk in response:
                # "The start_response callable must not actually transmit
                # the response headers. Instead, it must store them for the
                # server or gateway to transmit only after the first
                # iteration of the application return value that yields
                # a NON-EMPTY string, or upon the application's first
                # invocation of the write() callable." (PEP 333)
                if chunk:
                    if isinstance(chunk, text_type):
                        chunk = chunk.encode('ISO-8859-1')
                    self.write(chunk)

is doing.

fgallaire commented 8 years ago

Hello Graham,

Thanks for your investigations. Howerer, WSGIserver is a fork of CherryPy's WSGI Server which is widely used in production.

GrahamDumpleton commented 8 years ago

That doesn't excuse the fact that it is not compliant with the WSGI specification if it is converting Unicode strings in response byte to bytes. Someone could write code that works on this WSGI server and will then blow up when used on other WSGI compliant WSGI servers. The point of the WSGI specification is to ensure portability for peoples WSGI applications.

fgallaire commented 8 years ago

The result of using a unicode object where a string object is required, is undefined.

It's undefined, so this could be either an error or something else. It seems compliant.

GrahamDumpleton commented 8 years ago

Earlier in PEP 3333 it says:

When called by the server, the application object must return an iterable yielding zero or more bytestrings. This can be accomplished in a variety of ways, such as by returning a list of bytestrings, or by the application being a generator function that yields bytestrings, or by the application being a class whose instances are iterable. Regardless of how it is accomplished, the application object must always return an iterable yielding zero or more byte strings.

So it MUST yield byte strings.

Then in Unicode issues section it says:

For values referred to in this specification as "bytestrings" (i.e., values read from wsgi.input , passed to write() or yielded by the application), the value must be of type bytes under Python 3, and str in earlier versions of Python.

So in neither Python 2 or 3 is it allowed to be Unicode string, which would be unicode type under Python 2 and str under Python 3.

fgallaire commented 8 years ago

Yes the application MUST, and is faulty not to do so, but it's unrelated to the server behaviour.

In this case, the server behaviour is undefined, so it can do what it wants and be compliant.

GrahamDumpleton commented 8 years ago

It is a poor server implementation that doesn't hold WSGI applications to the WSGI specification and tell them when they are violating it. It is a disservice to users to not tell them there is a problem and silently allow it as it will only come back and bite them later. For example, you are converting using Latin-1. So it will work if only ASCII characters were used in the Unicode string, so hello world will work, but would then blow up if required UTF-8. A user could go along blissfully unaware until their production application starts failing in ways they don't understand.

fgallaire commented 8 years ago

@GrahamDumpleton if you see something else you're welcome :-)

GrahamDumpleton commented 8 years ago

If you are going to add a check, it should be if not isinstance(chunk, bytes) then raise an exception.

fgallaire commented 8 years ago

it's already here 73525017ff2e85b15743e7c2a0a492b658a6556d !

GrahamDumpleton commented 8 years ago

Tell me what is going to happen if someone incorrectly does:

return [1,2,3,4]

You are checking whether the items are text type (presumably that maps to unicode on Python 2 and str on Python 3) and only raise an exception in that very specific case.

There are still lots of other types besides bytes which someone could accidentally return and that check will not catch them and will pass it straight through.

That is why the exception should be for anything which IS NOT bytes.

fgallaire commented 8 years ago

Correct. Fixed. Thanks a lot @GrahamDumpleton