Use bjoern rather than the non-production flask provided WSGI server

ibrewster commented 4 years ago

Currently MLAPI uses the default flask-provided WSGI server, which is not recommended for production use. This pull request changes to using the light-weight C based bjoern WSGI server (https://github.com/jonashaag/bjoern) instead. The downside to this change is that it does add a dependency - bjoern - which in turn has a dependency of libev, which the user would have to manually install.

pliablepixels commented 4 years ago

I also use other modules like flask JWT etc. Are these not affected if we use bjoern?

ibrewster commented 4 years ago

I also use other modules like flask JWT etc. Are these not affected if we use bjoern?

They aren't. The actual application is still created and run by Flask and associated packages. Only the server portion - the part that actually listens on the port and receives the raw request, before passing it off to Flask (and associated, such as Flask JWT) for processing is replaced by bjoern.

You may have noticed the message you get when launching mlapi about not being intended for production - that's coming from the server, Werkzeug, not from Flask itself. See https://flask.palletsprojects.com/en/1.1.x/tutorial/deploy/#run-with-a-production-server for the Flask documentation on the issue.

Note that Bjoern is hardly the only choice. Your read-me mentions someone else who did a containerized version using gunicorn, I believe. uWSGI is another popular option. As such, there is nothing wrong with the option of ignoring this pull request, continuing to distribute as you have been, and simply letting the end user choose which - if any - production WSGI server to use. I simply figured I might as well offer this as an option if you want.

I chose bjoern here for a number of reasons:

1) some benchmarks, at least, show it to be the fastest option (https://www.appdynamics.com/blog/engineering/a-performance-analysis-of-python-wsgi-servers-part-2/) 2) I noticed in your config that you had a note about setting the number of processes to 1 if using GPU. Bjoern is single-process, single-threaded (yet, still, somehow quite fast), so I figured it would be the least likely to create any issues that could arise from one of the other tools if they were run in a multi-process mode. 3) It's easy - the code and process to use it is virtually identical to the code to use the built-in Werkzeug server (you pass "app" as an argument to run rather than calling a function on "app"). Some of the others you have to actually change how you launch things, bjoern was pretty much a drop-in replacement for this case.

pliablepixels commented 4 years ago

You may have noticed the message you get when launching mlapi about not being intended for production - that's coming from the server, Werkzeug, not from Flask itself. See https://flask.palletsprojects.com/en/1.1.x/tutorial/deploy/#run-with-a-production-server for the Flask documentation on the issue.

Good to know. I had just assumed flask was the culprit.

So I like bjoern as I read it. I don't think libenv is an issue.

I do have one more pending question: Seems like bjoern is single threaded. How do we launch multiple parallel detect requests? I noticed your PR removes the process= argument. What happens if we get, say, 5-6 parallel detect requests and each one takes around 40 seconds to resolve? (Updated: I missed your point 2, I assume you've answered one part - it will execute the job in its context, but what happens to the others? How long will it wait before timing out?)

ibrewster commented 4 years ago

Seems like bjoern is single threaded. How do we launch multiple parallel detect requests? I noticed your PR removes the process= argument. What happens if we get, say, 5-6 parallel detect requests and each one takes around 40 seconds to resolve?

That is a very good question. I may have to throw together a simple server that just stalls for a while to test that, cause now you've got me wondering. That benchmark page shows it handling WAY more requests/second than any other option, but that's probably requests that return pretty much instantly. It may not be as good an option for the longer-running request case if you want to be able to parallelize operations. Hmmm.....

ibrewster commented 4 years ago

Ok, I just did a test with a setup that basically just stalled for 40 seconds when called. I then opened three browser windows and called the script in quick succession in each of them. As feared, it only executed one at a time, meaning that the third window didn't even start executing until about 80 seconds in. There were no errors, but it was completely sequential So, yeah, in retrospect, perhaps gunicorn (or one of the other WSGI servers) that are multi-threaded would be a better option.

That said, the flask run configuration you currently have in your mlapi.py file (if I read things correctly) is:

app.run(host='0.0.0.0', port=g.config['port'], threaded=False, processes=g.config['processes'])

Which, in the process=1 default configuration, results in the same behavior. Of course, there there is the option of increasing the process count, whereas bjoern does not provide the option. You can also turn on threaded, which also appears to allow parallel execution (at least of my test code, actual python code may be different thanks to the GIL)

So depending on how well your code can handle being multi-threaded or multi-processed, perhaps Bjoern isn't the best option. Sorry for the noise!

pliablepixels commented 3 years ago

I've manually integrated this option into my dev branch - thanks for the PR!

ZoneMinder / mlapi

Use bjoern rather than the non-production flask provided WSGI server #15