caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
58.38k stars 4.04k forks source link

Add WSGI directive for serving Python apps #176

Closed jpoehls closed 4 years ago

jpoehls commented 9 years ago

I don't have time (or expertise) to contribute this but I'd definitely be interested in using it so I figured I'd create the feature request and try to gather interest.

I :heart: Caddy and want to use it for all the things. That includes serving up my Django app via WSGI.

I imagine this working similar to the proxy directive where you tell it to serve some directory via WSGI. (Forgive my perhaps incorrect use of terms, I'm a Python noob.)

I found this package for serving WSGI apps via Go but not sure of its production readiness.

mholt commented 9 years ago

I've looked into this a little bit. There's also this library but I don't know if it does what we need. There was this discussion on reddit that was... not super helpful. But I would be interested in adding wsgi support to Caddy.

I'm also a "Python noob" since I haven't written web apps in Python. So somebody with more experience might be more qualified to take on this task. See also see the fastcgi directive which does the same thing but for PHP (or any FastCGI) apps.

mholt commented 9 years ago

@jpoehls Another thing is we need to ensure static binaries, so cgo isn't an option (unless there is a somewhat convenient way to compile static binaries, cross-platform, with cgo.)

nudzo commented 9 years ago

uWSGI is quite sufficient for Python web apps. This is what is included in default nginx - bundled plugin. Python web app then runs under separate daemon: https://github.com/unbit/uwsgi Static content best be served by webserver, not via uWSGI.

blackrosezy commented 9 years ago

+1 love to have WSGI support for Python.

hoover67 commented 8 years ago

I got interested in Caddy while listening to the recent FLOSS Weekly episode.

+1 from me also for Python WSGI support.

Uwe

mholt commented 8 years ago

Anyone is invited to tackle this, so long as the requirements I mentioned above are met (no cgo, is cross-platform-compatible, etc) - this may not be trivial... but I think wsgi support would make a good add-on and obviously there's some demand for it.

korylprince commented 8 years ago

TL;DR: Just use a WSGI server (uWSGI, gunicorn, wsgiref, etc) and proxy to it

WSGI is not at the same layer as CGI or FastCGI. In other words, you can't communicate using WSGI directly over a TCP/UDP/Unix socket. There are only two ways (that I can think of at least) that would allow Caddy to have a WSGI directive:

  1. Call a python subprocess that runs the application under a python WSGI server (wsgiref perhaps, since it's built in since 2.5), then Caddy would proxy to it.

    This seems way too magical for me; I'd much rather explicitly configure the python WSGI server and point Caddy to it. We'd have to have several options (how many workers, what port/socket to start the server on, environment variables, etc) that would have to pass through to the python process.

  2. Use cgo to use start and communicate with a Python interpreter (using Python.h) like this package

    This obviously has the requirement for cgo and depends on Python which is a whole other can of worms.

So having "WSGI support" in Caddy I don't think is a reasonable hope without using cgo.

Apart from WSGI is uWSGI which is a WSGI server that implements the uwsgi protocol. It has grown quite a bit and supports many protocols: "WSGI, PSGI, Rack, Lua WSAPI, CGI, PHP, Go..."

For Python, connections look like this:

[ HTTP client ] <- HTTP -> [ Caddy ] <- uwsgi protocol -> [ uWSGI ] <- WSGI via embedded Python interpreter -> [ Python WSGI Application ]

We could implement the uwsgi protocol in Caddy, I think, without a lot of trouble (at least the HTTP parts). This is what nginx, lighttpd, cherokee, and apache (which also has mod_wsgi, similar to 2 above) use to talk to uWSGI.

However uWSGI and other WSGI servers like gunicorn can also speak HTTP, so we could just HTTP proxy to the WSGI server and not worry about supporting the uWSGI protocol.

The difference is speed is "very minor". (The commenter is the maintainer of mod_wsgi, so I think his opinion probably carries some weight.)

Also uWSGI has changed quite a bit over the last few years, and IMO has way too many options:

$ uwsgi --help|grep "\-\-"|wc -l                                                                                                                      
934

So someone would have to maintain the protocol in Caddy if any breaking changes were to happen (definitely happened before.)

So, to the developers of Caddy: if you want to support something more than just HTTP proxy, but don't want to use cgo, I think the uwsgi protocol is the only way to go at this point. To users of Caddy, see TL;DR;

Hope this helps.

mholt commented 8 years ago

Wow, that's great info. Thanks for the writeup @korylprince - I understand the problem a lot better now. :+1:

Going forward, we'll look into uWSGI then. It may be a maintenance burden, though, so perhaps a third-party would be willing to author a plugin?

korylprince commented 8 years ago

Upon, better examination of the protocol, it looks to be backwards compatible (i.e. new features are tacked on as opposed to modifying current packet layouts, etc)

That being said, I think it'd be a good idea to keep it as a plugin, at least at first.

I'd like to try my hand at writing the plugin, and have already started work on a basic Go implementation of the protocol.

I have requested a Slack invitation, as I'd like to discuss some design decisions before going to far.

mholt commented 8 years ago

@korylprince We've since moved to gitter for dev chat. Is this still something you're working on?

korylprince commented 8 years ago

@mholt No, I'm sorry. My life is just too busy. It was a much bigger undertaking than I was expecting.

mholt commented 8 years ago

That's alright, thanks for looking into it. I'll close this now; proxy seems to do fine for most people.

unbit commented 8 years ago

Hi, uWSGI lead dev here, just to confirm that the uwsgi protocol never changed since its first draft in 2008/2009. The protocol btw is simply a dictionary/hashmap serializer format so it is not coupled with Python in any way.

cblomart commented 8 years ago

so to run a wsgi app you'll need 3 peaces of code: caddy, gunicorn/uwsgi, python and your app. (if you want static libraries i already tested musl-gcc successfully) realy, i would rather scrap one layer of this setup (on less thing that can crash and to monitor)... talking wsgi python and your app are mandatory so ... gunicorn/uwsgi is the layer that can be avoided. implementing wsgi in caddy would then be good. :+1:

Another solution: develop your site in go (sorry python lovers)

unbit commented 8 years ago

@cblomart your app, python and gunicorn/uWSGI live all in the same process and you need a WSGI server for python+your_app otherwise you will not have a way to communicate with the world (this is why WSGI exists). The layers are two: caddy (or whatever webserver/proxy) and your application (that generally requires dozens of libraries/modules to work, and the WSGI server/layer is one of them). Implementing WSGI in caddy is simply not possible, unless you want to embed the python VM in go (and this is basically insane for the specific context as you will end mixing blocking and non-blocking code and the implication of running goroutines in python vm is still an obscure topic [i am one of the people working on it]).

'uwsgi' instead is a communication protocol (yes, unfortunate naming choice :) like FastCGI and SCGI. This issue/ticket is about supporting this, i do not think people want to go back to '90s when the proxy and app used to live in the same address/process space :). The advantage in implementing uwsgi is basically performance-related but uWSGI (and gunicorn) can already work (gunicorn via http, uWSGI via http or fastcgi) behind caddy without changing a single line of code.

mholt commented 8 years ago

@unbit I really appreciate your feedback here, thanks! Also thanks for your work on uWSGI, it's obviously used by a lot of people.

Forgive my ignorance in this area... but I thought WSGI was the name of the protocol and uWSGI was the name of the server that implemented it. So I might be a little confused now.

WSGI is definitely described as the "interface" or specification by which servers communicate.

The uWSGI homepage says only:

The uWSGI project aims at developing a full stack for building hosting services.

Which doesn't give me a lot of concrete info about what the uwsgi command actually does.

I think one reason there's a disconnect in how we want to approach this problem is that the Go philosophy is more or less "bundle it all into a single process" which I know is not the uWSGI way.

you need a WSGI server for python+your_app otherwise you will not have a way to communicate with the world (this is why WSGI exists)

This is hard for Go programmers to fathom; an app that cannot communicate over the network without another process entirely? Can you help me understand why that is? (Remember, I don't have Python web app experience.)

i do not think people want to go back to '90s when the proxy and app used to live in the same address/process space :). The advantage in implementing uwsgi is basically performance-related

How is it a performance advantage to call out to another process?

unbit commented 8 years ago

WSGI is a very high-level interface for abstracting the network and concurrency layer from python webapps. A WSGI interface is:

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return ['Hello World']

so the app knows anything below this. As you can already understand this is completely the opposite of the Go approach. WSGI is python-related, but the same standard exists for ruby (Rack), perl (PSGI), Lua (WSAPI) and many others. Implementing them requires embedding the language vm in the webserver (like uWSGI and Apache do) and this is a very invasive thing, so during years splitting apps from proxies has became the de-facto standard-approach.

Regarding uWSGI is a very generic app server, as it embeds the languages VM (python, ruby, perl...) and implements their web-related interfaces giving them a network and concurrency infrastructure. In a go app there is no such distinction, the developer exactly knows when and how to manage concurrency, but in other languages it depend on the underlaying server infrastructure (gunicorn for example is multiprocess, waitress multithreaded, unicorn for ruby is multiprocess, uWSGI supports threading, multiprocessing and coroutines). All of those apps can be theoretically moved between servers and concurrency paradigms.

So, uWSGI is the application server, WSGI is the high-level interface for python apps (implemented by uWSGI) and 'uwsgi' (lowercase) is a communication protocol between proxies/webservers and appservers (at the same level of FastCGI). I have said uwsgi (the protocol) is a matter performance in respect of http (as it is easier/faster to parse). Obviously any form of IPC has a cost. I do not want to convince you that splitting proxies from apps is the best way (i currently work mainly in Go, so i understand the implications) but this is the 'blessed-way' in those other languages, so if you want to support them in some way, you should take this into account (and honestly, as a go developer that has worked with its internals, i would stop using caddy if it start embedding libpython or libruby in its core ;)

I would be obviously very-happy if caddy will support the 'uwsgi' (lowercase, transport) protocol in addition to fastcgi, bit it is absolutely not required as python developers can already use their favorite application servers behind it (via http or fastcgi proxying)

mholt commented 8 years ago

Ah! I understand now 🙌 Thanks for the explanation. Didn't realize uWSGI, WSGI, and uwsgi were different things.

Makes a lot more sense. I will give a chance for the other readers of this issue to see what you have written and provide some comments. Maybe it is not too difficult to implement uwsgi (protocol) as you said. But then again, perhaps there is no benefit since FastCGI can be used.

cblomart commented 8 years ago

@unbit thanks for the clarifications as well as @mholt to resume it clearly. Names can be treacherous. I like to think that go programers don't think like in the nineties. And i wouldn't agree more that embedding python (or any other language) would'nt be a good solution. So proxying to the application server in this case is required, the app (WSGI) and server (uWSGI) are still the same process so no economy there. The only left question is the protocol between the proxy server and the app: http, fastcgi, uwsgi. As uWSGI is supports fastcgi, i will look there.

heri16 commented 7 years ago

+1 to reopen

francislavoie commented 7 years ago

https://github.com/mholt/caddy/issues/1638 is the new relevant issue. uWSGI should be coming in the not too distant future :)

slightfoot commented 7 years ago

@heri16 You can see the PoC here. Its not production ready by any means. https://github.com/slightfoot/caddy-uwsgi

abiosoft commented 7 years ago

@slightfoot If you wouldn't mind that I'm trespassing to your code. uwsgiConfig and uwsgiRoundTripper feels redundant since the package name is already uwsgi. They could as well be Config and RoundTripper.

slightfoot commented 7 years ago

@abiosoft yup, they'll change. It's just an initial test.

gregnordin commented 7 years ago

+1 that this is moving forward! We are developing a new Flask-based app and I have wanted to use Caddy for quite some time and have been hoping there would be a way to use Caddy with the Flask app. I just now started looking into it and found this issue. Thanks to everyone who is working on making it happen!

francislavoie commented 7 years ago

@gregnordin for the meantime you can run a python server and use Caddy as a reverse proxy to it. The benefit is you still get HTTP/2 and automatic TLS (TLS is terminated at Caddy)

gregnordin commented 7 years ago

Thanks, @francislavoie. One of the things I love about Caddy and why I've wanted to use it is the automatic TLS.

slightfoot commented 7 years ago

@gregnordin I'll be working on this the next few days, so expect some updates. If you want to try it out and let me know what you think.. that would be great. Will keep you posted.

gabormarinov commented 7 years ago

@slightfoot is there any news on uwsgi protocol support?

Droppers commented 7 years ago

Any update or progress at all?

slightfoot commented 7 years ago

@Droppers I was speaking to @mholt about this recently. I'm a little busy atm. But over the next week or so I'll aim at getting this sorted.

winny- commented 6 years ago

Is there any way I can contribute to this? I'd love to use caddy for wsgi applications and I do know the basics of go, not familiar with caddy's code base (as of yet)

francislavoie commented 6 years ago

@winny- good place to start is here: https://github.com/mholt/caddy/wiki/Extending-Caddy

The Wiki has some info that might help you get started.

Also, you can look here: https://github.com/mholt/caddy/tree/master/caddyhttp/fastcgi

Which is the existing fastcgi implementation, I figure wsgi will have a lot of similarities with that.

winny- commented 6 years ago

Are there any existing WIP branches I should check out?

winny- commented 6 years ago

After some discussion with folks of the python irc channel, reading PEP's, and reading documentation for uWSGI, it occurs to me there is no purpose to implement uwsgi in caddy, as one can use http anyway. So I won't be implementing this feature. Reread this: https://github.com/mholt/caddy/issues/176#issuecomment-161910764

theodesp commented 6 years ago

After a small trial, I managed to bridge Python and Go together via FastCGI. For example:

Caddyfile

localhost:9000

fastcgi / unix:hello-world.sock
log access.log
errors error.log

and in the same folder:

main.py

import sys
import os
import logging
from html import escape
from flup.server.fcgi import WSGIServer

def app(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    yield "hello world"

def main():
    try:
        WSGIServer(app, bindAddress='./hello-world.sock', umask=0000).run()
    except (KeyboardInterrupt, SystemExit, SystemError):
        logging.info("Shutdown requested...exiting")
    except Exception:
        traceback.print_exc(file=sys.stdout)

if __name__ == '__main__':
    main()

Now if you run main.py and start caddy visit http://localhost:9000/ and you will see the message.

@mholt Do you need to add the example in the list of https://github.com/caddyserver/examples ?

Note that it will work with Flask also: http://flask.pocoo.org/docs/1.0/deploying/fastcgi/

mholt commented 6 years ago

@theodesp That's cool! Yes, please do submit a PR to that repo.

theodesp commented 6 years ago

PR https://github.com/caddyserver/examples/pull/135

mholt commented 5 years ago

Caddy 2's reverse proxy (currently in beta on the v2 branch) can support various protocols now between the proxy and the backend. In other words, the transport which fulfills a RoundTrip between the incoming HTTP request at Caddy and the backend can use whatever protocol, including wsgi. (This is how we've implemented FastCGI.)

So, anyone is invited to contribute a WSGI transport for the v2 proxy. I'll be happy to give pointers if you need help with the Caddy integration. I'm just not familiar with WSGI myself.

Immortalin commented 5 years ago

@mholt does Caddy handle slow client buffering?

Gunicorn requires it: https://docs.gunicorn.org/en/stable/deploy.html

francislavoie commented 5 years ago

@Immortalin could you elaborate on what "slow client buffering" is? Gunicorn's documentation doesn't make it clear what they're actually referring to. "Slow" is quite vague in this context.

Immortalin commented 5 years ago

https://www.brianstorti.com/the-role-of-a-reverse-proxy-to-protect-your-application-against-slow-clients/

mholt commented 5 years ago

@francislavoie Thanks for asking the clarifying question :grin:

@Immortalin Sure, it can do that:

// if enabled, buffer client request
if h.BufferRequests {
    buf := bufPool.Get().(*bytes.Buffer)
    buf.Reset()
    defer bufPool.Put(buf)
    io.Copy(buf, r.Body)
    r.Body = ioutil.NopCloser(buf)
}

I haven't tested this yet, but it's the basic idea.

You should be sure to set read timeouts on your server and limit the request body size -- Caddy 2 already has facilities for this. But note that this still doesn't actually solve any problems -- it just moves them from the backend to the proxy -- and potentially leaves servers vulnerable to slowloris attacks and memory exhaustion. It's too bad that gunicorn requires it.

DavidOliver commented 5 years ago

@mholt, how does one go about enabling buffering (not sending request to backend until its fully received), both in Caddy 1 and 2? Or is the slow client buffering code you showed something you've just done?

I've found http.timeouts and http.fastcgi send_timeout for Caddy 1, but if I understand correctly they won't stop the request from occupying one of the backend app's processes for whatever the timeouts are set to.

mholt commented 5 years ago

@DavidOliver I haven't tested or committed that code to Caddy 2 yet (AFAIK Caddy 1 isn't able to do it at all). I can update here when I push that change, but in the meantime, let's try to keep this thread about WSGI transport to upstream.

Would anyone like to implement it for Caddy 2?

mholt commented 5 years ago

@DavidOliver @Immortalin I've implemented client request buffering in 1228dd7d93b59409fd844400b9c114be267df3a3. I tested it in basic proxy scenarios and it seems to work fine, but I'm not guaranteeing that it's perfect or even a good idea -- just use it if absolutely required and if you have proper protections in place as well.

Would one of you like to contribute a WSGI transport?

DavidOliver commented 5 years ago

@mholt, many thanks. When I try it out in front of Gunicorn I'll be sure to set timeouts as well. I'm thinking that if slow requests are incoming, it's likely better to have them handled (and dropped on timeout) in Go/Caddy than in Python.

Related, there are other non-default worker models available in Gunicorn which don't come with the slow request buffering requirement, with async ones being necessary if an app makes outgoing API calls, for example; I haven't tried any yet.

I've hardly done any Go and so would unfortunately not be a good candidate to take on a WSGI transport. Also, I'm happy proxying to Gunicorn via TCP or Unix socket for my couple of Python sites, as others have mentioned in this discussion.

Caddy <-HTTP:4000-> Gunicorn <-WSGI-> Python app with WSGI config

Example Python app configuration - a Wagtail (built on Django) site:

/wsgi.py:

"""
WSGI config for my_app project.

It exposes the WSGI callable as a module-level variable named ``application``.

For more information on this file, see
https://docs.djangoproject.com/en/2.1/howto/deployment/wsgi/
"""

import os

from django.core.wsgi import get_wsgi_application

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "my_app.settings.production")

application = get_wsgi_application()

/my_app/settings/(base.py|production.py):

WSGI_APPLICATION = 'my_app.wsgi.application'

If you think it would be useful I'd be happy to write up a fuller guide, including the reverse proxying in Caddy, once I've got it set up myself! (I'm using another webserver for my Python sites at the moment but would like to switch to Caddy.)

Thanks again.

mholt commented 4 years ago

@DavidOliver Thank you for that information! I will refer to that if/when I get back around to this.

FWIW, here is a simple, pure-Go uWSGI transport library I was just pointed to on Twitter the other day, I can't remember if I have seen this before or not: https://github.com/mattn/go-uwsgi/

Immortalin commented 4 years ago

@mattn

mattn commented 4 years ago

what can i do