Streaming probe data with Tornado

arvoelke commented 10 years ago

Just a simple proof of concept. The setup will have to get a bit more complex to support pausing, restarting, model updates, etc...

celiasmith commented 10 years ago

Does this require people to install more packages? I thought we decided to minimise those kinds of reqs... What advantages are we getting for this extra complexity?

On 8 March 2014 17:02:15 GMT-05:00, Aaron Voelker notifications@github.com wrote:

Just a simple proof of concept. The setup will have to get a bit more complex to support pausing, restarting, model updates, etc... You can merge this Pull Request by running:

git pull https://github.com/ctn-waterloo/nengo_gui tornado_streaming

Or you can view, comment on it, or merge it online at:

https://github.com/ctn-waterloo/nengo_gui/pull/1

-- Commit Summary --

Migrated from SWI to Tornado

Very basic streaming of probe data to frontend

Unbreak automatic line highlighting

-- File Changes --

M .gitignore (3) M .gitmodules (4) D ace (1) M main.py (201) M nengo_helper.py (7) A static/ace (1) R static/favicon.ico (0) R static/js/d3.min.js (0) A static/js/jquery.1.5.2.min.js (16) A static/js/jquery.stream-1.2.min.js (25) D swi.py (345) R templates/index.html (237)

-- Patch Links --

https://github.com/ctn-waterloo/nengo_gui/pull/1.patch https://github.com/ctn-waterloo/nengo_gui/pull/1.diff

Reply to this email directly or view it on GitHub: https://github.com/ctn-waterloo/nengo_gui/pull/1

arvoelke commented 10 years ago

The advantage is less complexity. It only took ~~100~~ 50 lines of Tornado code to implement asynchronous streaming of the simulation's probe data to the frontend, without even using threads. Without Tornado, we would have to roll our own solution. Threading should generally be avoided at all costs in Python (and Nengo is currently far from thread safe). The solution must also minimize blocking during simulation, since the server should still respond to requests. Tornado was designed from the start to solve this deceptively difficult problem efficiently and robustly. It's not even clear to me how to reinvent what they did with our current setup.

tcstewar commented 10 years ago

For now, @arvoelke is trying out this Tornado approach to see what advantages we can get. The direct streaming of data from a running model to something like an interactive mode visualizer is the main thing that seems like it'll be an advantage this way.

That said, I still think it's a firm requirement that people can easily install this on any operating system without requiring internet access (i.e. from a USB drive). There are definitely ways of doing that for something like Tornado, but they will also impose a complexity cost. After all, if it just means we include the code along with our code, then it's pretty easy.

The biggest worry I have is this note from the Tornado docs: "Tornado will also run on Windows, although this configuration is not officially supported and is recommended only for development use." There's also the note that "even though Mac OS X is derived from BSD and supports kqueue, its networking performance is generally poor so it is recommended only for development use."

So we're exploring this as an option, but before we commit to something like Tornado we need to figure out exactly how complex it will be to get people to install it in a conference situation (i.e. without internet access and on a variety of machines)

arvoelke commented 10 years ago

The Windows package can be installed via exectuable on a USB stick from here: http://www.lfd.uci.edu/~gohlke/pythonlibs/#tornado I've used this on 3 different Windows machines for development and play, without any problems.

For Linux/Mac we can probably just include the egg on a USB and run setup.py.

The MacOS performance note is probably for scaling to 10,000 active connections. I don't expect we'll be going over 2-3 active connections, unless we're hosting a public server, in which case we shouldn't be using MacOS anyways.

tcstewar commented 10 years ago

The Windows package can be installed via exectuable on a USB stick from here: http://www.lfd.uci.edu/~gohlke/pythonlibs/#tornado I've used this on 3 different Windows machines for development and play, without any problems.

Good to know. Is it pure Python? Or do we need to have different versions on the USB key, and have people figure out whether they have 64-bit python installed or 32-bit python installed, and which version, and so on?

In fact, I'm pretty sure that we'll end up having to package it all up into our own giant windows installer anyway, since we can't even assume that people have Python installed. Does the Tornado license allow that?

hunse commented 10 years ago

Tornado is under the Apache License v2, so I think we should be good there. Personally, I can't see it being any harder to install than Numpy, which is a beast, and we definitely want to avoid reinventing the wheel. So I'm for it! @tcstewar is right, though, that we need to take some time soon to figure out how we're going to handle installing without internet (whether we need tornado or not).

tcstewar commented 10 years ago

Just to add a note here so we remember it in the future, Tornado is also dependent on backports.ssl_match_hostname

tcstewar commented 10 years ago

And there seems to be something weird with the ace directory in this branch... when I git checkout tornado_streaming it mostly works, but I get warning: unable to rmdir ace: Directory not empty and I have to manually copy the ace directory to static/ace (which is where it really should be, in any case). And then delete it before it'll let me git checkout master. This probably has something to do with it being marked as an external repository. Is anyone else getting this error, or is it something unique to me?

tbekolay commented 10 years ago

Submodules are a huge pain; I would highly recommend not using them if at all possible. The error you're getting is part of why.

You can try git submodule udpate or git submodule sync; I'm not clear exactly what they do, to be honest, but in dealing with submodules in the past I've had to use them...

tcstewar commented 10 years ago

Submodules are a huge pain; I would highly recommend not using them if at all possible. The error you're getting is part of why.

Ah, I had no idea. And here I thought I was being all smart and git-knowledgable by doing the submodule thing instead of a mass theft of files from the ace repository..... aw well.

tbekolay commented 10 years ago

The alternative isn't necessarily to vendorize it ;) You can do things like have a script that downloads it to the appropriate place, and put that in gitignore so it doesn't get committed. Or just stick with the submodule; it's just been my experience that it causes more problems than it solves.

kousu commented 10 years ago

The dat people are debating how to do dynamic streaming updates now. Thinking broader means that can you have a single datamodel where your simulations log to, can be analysed later, and, with this streaming feature, can be visualized in real time. Does nengo-gui have an way to make the linegraphs that Java version does yet? Can you link me your API plan for that?

(@tbekolay, we had the same experience with submodules over at modex. I thought we were just newbies)

Seanny123 commented 10 years ago

I need to visualize something in Nengo 2.0 in an asynchronous manner; can I assume that we're sticking with Tornado or is the matter still up for discussion? If it's still being discussed, what specifically is left up for debate and are there any alternatives under consideration?

@tbekolay does this seem like a good thing to review at the next Nengo dev meeting?

tcstewar commented 10 years ago

I'm still not convinced that tornado's the way to go, but that's mostly because I'm unfamiliar with it and I'm worried about the dependencies. If the point is to just use websockets to stream data, then I'd prefer just using a websockets library.

Seanny123 commented 10 years ago

When you say "websockets library" do you mean one that's included by default in Python distributions (if yes, a link would be appreciated) or are you saying that you're fine with a websockets library as long as it doesn't have awkward dependencies, is multi-platform and works on many different platforms?

On Sun, Sep 14, 2014 at 5:16 PM, tcstewar notifications@github.com wrote:

I'm still not convinced that tornado's the way to go, but that's mostly because I'm unfamiliar with it and I'm worried about the dependencies. If the point is to just use websockets to stream data, then I'd prefer just using a websockets library.

— Reply to this email directly or view it on GitHub https://github.com/ctn-waterloo/nengo_gui/pull/1#issuecomment-55539865.

tcstewar commented 10 years ago

I was thinking either something like ws4py (which has no dependencies outside of Python, and we should even be able to include as part of the nengo_gui install) or just implementing the protocol, as it looks like maybe 20 lines of python: http://popdevelop.com/2010/03/a-minimal-python-websocket-server/

kousu commented 10 years ago

For reference, the python websocket libraries I know of are tornado, Autobahn (on twisted for py2/asyncio for py3) and the one on gevent and they are all frameworks forcing you to rewrite your app.

If all you want is a websocket, I've been getting good mileage out of websockify. My prototype is here so you can see how I've been using it. It's not perfect. Websockify has some quirks which I suspect are bugs in its implementation of the spec, and it eats at least one port and is ignorant of HTTP paths, so you need to stick an HTTP proxy in front if you need more than one websocket; for modex I haven't decided if I want to force a dependency on lighttpd or nginx, or adapt SimpleHTTPServer to do proxying, or just retrofit the Twisted server modex originally used to do proxying.

tcstewar commented 10 years ago

Looks like the link I sent was to an old version of the websockets protocol. Here's a minimal python version of the correct system:

http://sidekick.windforwings.com/2013/03/minimal-websocket-broadcast-server-in.html

Seanny123 commented 10 years ago

I checked out the ws4py licence and ran a simple example. It seems usable.

However, I think we may be a bit too hasty to dismiss Tornado (especially if we're just going to use another package anyways) and I don't want to rewrite code that Aaron's already made.

Judging from the updated docs, it should run fine on all platforms. MAC OS X is just known to be slow networking wise and Windows isn't "officially" supported, but judging from Aaron's experience and the fact that's it's bundled by default in Anaconda, I can't imagine this being a problem.

In terms of requirements, Tornado is standalone, but has optional packages for some features that we won't be using. Bundling it for offline installation shouldn't be a problem, considering that it's size is 2 MB when decompressed and as Aaron mentioned previously it's been bundled before. Additionally according to a comment on this StackOverflow question:

Tornado runs very well on Windows. It's just not as performant and scalable because it uses select for the I/O multiplexing. But you should be able to get a decent performance out of it with tornado-pyuv.

Consequently, I think our concerns about Windows are unfounded.

Finally, I think it would be better to just use a pre-packaged library instead of creating our own, since I would much rather have someone else find all the corner cases of web-sockets rather than having to find and fix them on my own. To support my final argument, I would like to note that in the link that Terry sent about websockets, a commenter found a limitation and the author notes in multiple places that any serious application should just use a library.

tcstewar commented 10 years ago

I agree Tornado should be explored as an option. But I will repeat what was said above that we need to know how complex it will be to install it in a situation where we don't have internet access. The ideal is a situation where people just have to do as few things as possible to install it (that was a huge win for the Java version, that we could get people to just download, unzip, and run it). Right now, I believe the install for nengo 2.0 would be something like: install python (if it isn't installed), install numpy (if it isn't installed), and install some combined package of nengo and nengo_gui. That's not too bad. But if we have to add to that manually installing certifi (a dependency of tornado), backports.ssl_match_hostname (another dependency of tornado), and tornado, then I think that's a bit of a problem. So then we'd want to start looking in to big packaging things that install everything all at once (which is something we may want to do anyway).

hunse commented 10 years ago

We could always put Tornado and all associated packages in a zip with Nengo and have a shell script that adds them to the python path and starts the GUI. That would work, right? As long as they're all just python. On Sep 15, 2014 6:32 PM, "tcstewar" notifications@github.com wrote:

I agree Tornado should be explored as an option. But I will repeat what was said above that we need to know how complex it will be to install it in a situation where we don't have internet access. The ideal is a situation where people just have to do as few things as possible to install it (that was a huge win for the Java version, that we could get people to just download, unzip, and run it). Right now, I believe the install for nengo 2.0 would be something like: install python (if it isn't installed), install numpy (if it isn't installed), and install some combined package of nengo and nengo_gui. That's not too bad. But if we have to add to that manually installing certifi (a dependency of tornado), backports.ssl_match_hostname (another dependency of tornado), and tornado, then I think that's a bit of a problem. So then we'd want to start looking in to big packaging things that install everything all at once (which is something we may want to do anyway).

— Reply to this email directly or view it on GitHub https://github.com/ctn-waterloo/nengo_gui/pull/1#issuecomment-55669215.

Seanny123 commented 10 years ago

Sorry I missed those requirements and mis-understood you Terry. I'll try to slow my reading speed for next time. (:

Based off of this question on StackOverflow, I was able to download the dependencies in a manner that makes them easy to bundle with our install and install them offline. Basically, you put all the requirements (certifi, backports.ssl_thingy, tornado, rpyc) in a folder and then your run a slightly verbose python setup.py command. Given that the dependencies are all pure python and can be installed offline easily, is everyone okay with moving forward with Tornado now or are there other concerns that need to be addressed?

ctn-archive / nengo_gui_2014

Streaming probe data with Tornado #1