deontologician / spaceship-build

Sci-fi spaceship engineering simulation
GNU Affero General Public License v3.0
5 stars 1 forks source link

Make server.py handle multiple connections #44

Closed xXxH3LIOSxXx closed 10 years ago

deontologician commented 10 years ago

One useful library for this will be asyncio: https://docs.python.org/3/library/asyncio.html

xXxH3LIOSxXx commented 10 years ago

Basic multi session handling through threads setup. May consider this one closed since any additional functionality strictly speaking would be to allow other features like chat, or hooks for the cmd line functions etc.. let me know how you want to handle / close if necessary.

xXxH3LIOSxXx commented 10 years ago

Lots of progress on the server today and in theory.

Thanks for your suggestion of the asyncio module, however, I think I want to handle this with threads rather than run the sockets asynchronously because there's no telling what we eventually build into the server side and running each socket in its own thread gives us a lot of flexibility. To add to that the "Twisted" libraries also work in an asynchronous manner, and I'm still investigating what they could bring to the table for us. So far I like the simplicity of writing our own solution but if it becomes burdensome we'll go further in that direction, feel free to check it out.

Initially I was going to use the 'thread' module, but in this latest commit I've instead switched to the more robust 'threading' module and setup a basic perpetual socket, and built in simple control to close sockets on the client-side. Cleaned up some formatting and removed some junk from the code. I'll close this issue out and open additional ones for more specific goals.

deontologician commented 10 years ago

Threads in python have real issues due to the global interpreter lock. Basically no real work can be done in parallel. If you have many sockets down for io that aren't reading a lot most of the time, async is better (you can have more simultaneous connections with async than one thread per connection can manage). The other benefit of a sync is that you don't need to do synchronization (locks) On Nov 12, 2014 3:00 PM, "H3LIOS" notifications@github.com wrote:

Lots of progress on the server today and in theory.

Thanks for your suggestion of the asyncio module, however, I think I want to handle this with threads rather than run the sockets asynchronously because there's no telling what we eventually build into the server side and running each socket in its own thread gives us a lot of flexibility. To add to that the "Twisted" libraries also work in an asynchronous manner, and I'm still investigating what they could bring to the table for us. So far I like the simplicity of writing our own solution but if it becomes burdensome we'll go further in that direction, feel free to check it out.

Initially I was going to use the 'thread' module, but in this latest commit I've instead switched to the more robust 'threading' module and setup a basic perpetual socket, and built in simple control to close sockets on the client-side. Cleaned up some formatting and removed some junk from the code. I'll close this issue out and open additional ones for more specific goals.

— Reply to this email directly or view it on GitHub https://github.com/deontologician/spaceship-build/issues/44#issuecomment-62674310 .

xXxH3LIOSxXx commented 10 years ago

Ok good to know, so then threading may not scale well. I'll work w/ both asyncio and Twisted and see how we can leverage them as well and come up with a few more examples of how we can get this going sans threads. Was cool to see how to handle threading in Python regardless of what method we end up going with in the end.

When you say 'basically no real work can be done in parallel' can you be more specific, is there a fall off point where threading becomes non-performant, which scenario's would you use them in etc (maybe the global interpreter lock explains this). Finally - in terms of connection handling, what do we loose by going w/ threads, does any functionality deterioriate using them, or is it simply a better practice to use other methods.. I ask because many examples for network programming using threading for multiple connection handling, though they don't use this particular module to achieve it.

On Wed, Nov 12, 2014 at 12:04 AM, Josh Kuhn notifications@github.com wrote:

Threads in python have real issues due to the global interpreter lock. Basically no real work can be done in parallel. If you have many sockets down for io that aren't reading a lot most of the time, async is better (you can have more simultaneous connections with async than one thread per connection can manage). The other benefit of a sync is that you don't need to do synchronization (locks) On Nov 12, 2014 3:00 PM, "H3LIOS" notifications@github.com wrote:

Lots of progress on the server today and in theory.

Thanks for your suggestion of the asyncio module, however, I think I want to handle this with threads rather than run the sockets asynchronously because there's no telling what we eventually build into the server side and running each socket in its own thread gives us a lot of flexibility. To add to that the "Twisted" libraries also work in an asynchronous manner, and I'm still investigating what they could bring to the table for us. So far I like the simplicity of writing our own solution but if it becomes burdensome we'll go further in that direction, feel free to check it out.

Initially I was going to use the 'thread' module, but in this latest commit I've instead switched to the more robust 'threading' module and setup a basic perpetual socket, and built in simple control to close sockets on the client-side. Cleaned up some formatting and removed some junk from the code. I'll close this issue out and open additional ones for more specific goals.

— Reply to this email directly or view it on GitHub < https://github.com/deontologician/spaceship-build/issues/44#issuecomment-62674310>

.

— Reply to this email directly or view it on GitHub https://github.com/deontologician/spaceship-build/issues/44#issuecomment-62674550 .

deontologician commented 10 years ago

Basically, some internals of the python interpreter are not thread safe, and so they have a global lock that is obtained whenever a python instruction is being executed. In practice, what this means is that only one thread can actually be executing at a time. All of the others are waiting on the global interpreter lock (google python GIL). Threads that are doing I/O aren't subject to this, so threading can still be used for lots of waiting connections like in a server, assuming most of them only want to do a little bit of work whenever a message comes in and then quickly go dormant again (which is the case with most servers).

The issue with having many threads is that the threads aren't free. They take up memory and they don't provide much benefit given that you can't do actual simultaneous CPU work with them. So if you have 32 server cores, only one is actually being used at a time! The multiprocessing library gets around this by basically replicating the threading library interface, but it spawns new processes instead of threads. Processes are much more expensive however, you'll probably only want one per core (they are an entirely new python interpreter each).

Classic servers like apache use the one thread per connection strategy. This is fine for a while because they're coded in C and can actually do simultaneous work. But even these servers run into trouble with many many connections because the memory requirements of those threads (that are doing nothing 99% of the time) add up a lot. So in recent years async has taken off to solve the C10K problem (how to have 10K simultaneous connections). Async has always been around in the form of the socket select option, but that was relatively slow. In modern times though, linux servers have a system call called epoll that allows really fast polling for activity on a socket. This means most applications want to push the actual detection of which socket has activity down into the OS level, since it's much faster. Libraries like twisted and tornado make use of epoll, and the programmer just provides a callback to run when activity happens on the socket.

That being said, I think socket level is too low level for us. Basically, anything that has "read N bytes and check if the response is finished" is something that's been written a million times before, should be in a C library for speed, and we should deal with a higher level interface.

Check out the josh/clientserver branch. there's a couple of files:

These are based on the zeromq library which is a higher level socket interface (with some really nice benefits). I'm using their encryption there, plus a library called msgpack to send dictionaries, and a library called blosc to do compression.

Another reason to not do our own socket level code is that you basically never want to implement encryption yourself, you always want to use someone else's battle-tested implementations.

deontologician commented 10 years ago

Here is a description of the encryption protocol zeromq uses

http://curvezmq.org/page:read-the-docs as an indication of how involved this stuff is even if you're using a library to do the encryption algorithms themselves.