MaddieM4 / EJTP-lib-python

Encrypted JSON Transport Protocol library for Python 2.6-3.3.
GNU Lesser General Public License v3.0
23 stars 10 forks source link

Improve handling of jacks ($200) #66

Open MoritzS opened 11 years ago

MoritzS commented 11 years ago

I think jacks should get a dynamic registering structure like I explained for Frames in #65.

Furthermore it would make sense to be able to make Jacks read or write only. For example a tls-jack will only need a ca-certificate if it's read only but will also need a proper certificate and key if it's acting as a server.

MaddieM4 commented 11 years ago

Indeed. While I'd like to focus on Frames first, I think both would benefit from this kind of overhaul.

MaddieM4 commented 11 years ago

Dealing with read-write disparity

We do want very cleanly separated classes for server and client where it makes sense, but for most persistent connection transports, it's wasteful not to use the same pipe where already available. The question is how to make a system that works in a satisfactory way despite the wildly varying needs and models of different transports.

Well, it took awhile, but I think I can finally answer that question.

Connection class

Generic among the Jack subsystem, i.e., not subclassed by individual jacks. Initialized with two callback properties, read and write. write is called by the router to send outgoing messages. read is called by whatever jack receives an incoming message for that connection.

>>> Connection(router, read, write, address_local, address_remote)

Reader class

Class that handles incoming connections. Creates a new Connection object when connected to.

Each module exposes this as reader_class.

>>> reader = ejtp.jacks.tcp.reader_class(router, address)

Writer class

Class that handles outgoing connections. Creates a new Connection object when a frame is being sent somewhere we don't already have a connection for.

Each module exposes this as writer_class. May be equivalent to reader_class to signal that a single class performs both duties. The writer_ignores_interface module variable tells the router to do 2 things if True: initialize the class without a second parameter, and use (class, None) as its key in the router's jack table. Needless to say, since reader_class requires an interface to bind to, it's a really bad idea to set both reader_class == writer_class && writer_ignores_interface.

>>> ejtp.jacks.tcp.writer_ignores_address # Interface is ignored by some writer classes, but not this one
False
>>> writer = ejtp.jacks.tcp.writer_class(router, interface)
>>> writer.connect(address)
<ejtp.jacks.connection.Connection object at ...>

Router responsibilities

Manage jacks and connections. Connections register as they are created. Always use an existing connection where available, otherwise create a new one with the appropriate jack module's writer class.

Jack reuse is an important topic - what makes a jack distinct? Class + address truncation (2 elements long). So we can use (class, addr) tuples as keys in the router's jack cache. This seamlessly handles the logic in situations where there are not separate Reader and Writer classes, but a single dual-purpose class.

make_jack

"Create on need" is pretty straightforward with the Writer class. Clients can probably get away with just caring about setting up a Reader so they can take incoming data from the network. So we'll go with a hybrid solution along that line - create Reader on Client init, if make_jack = True, and create Writers on the fly as needed.

Registration

Let's model it based on how we do it in frames - a global registration module in ejtp.jacks, with a registerJack decorator and a createJack(address, is_reader) function.

While we're at it, make one of the attributes of the BaseJack the "router key", AKA (class, address) tuple. Maybe BaseJack.router_key, nice and simple.

State of this proposal

Edited to be less shaky and incorporate stuff decided in discussion. Can hopefully be worked on in parallel with Router changes (which may not be totally independent of these changes, but we can do some temporary router bandaging in the very short term while waiting for a proper router reimplementation).

MoritzS commented 11 years ago

These are some really well thought proposals! But I have some questions and suggestions:

MaddieM4 commented 11 years ago

Heh, thanks, it was really late in my timezone when I wrote them, and I wasn't totally sure which side of the straddled hazy/genius line this would ultimately fall on :)

Sorry if there's chaos in your email notifications, I had a bit of minor catastrophe trying to fix the formatting after making this reply via email.

MaddieM4 commented 11 years ago

Bounty is $100.

http://www.freedomsponsors.org/core/issue/197/improve-handling-of-jacks


(Copied from acceptance criteria)

Work with me and possibly other developers to create a registered jack system, according to designs determined in conversation in the issue tracker.

Feature may or may not ship in stable at the end of this month, depending on its readiness. If it doesn't land in 0.9.3, we'll just push it to 0.9.4.

MoritzS commented 11 years ago

I don't totally grasp the (old) router system, yet. Why exactly is it necessary to distinct between Clients and Jacks? Or is this just a thing that will become obsolete if we have the new Connection class with readers and writers?

MaddieM4 commented 11 years ago

Historically, it's because client addresses are longer, with that third "local identifier" element. So for outbound messages, you have to check a trimmed version of the frame address against available jack addresses.

I think we'll still want to keep the two distinct, even if we can make address comparison somewhat agnostic to the above problem, because the Connection class would be pointless overhead for local message-passing.

MoritzS commented 11 years ago

I have some more questions:

MaddieM4 commented 11 years ago
class MyWriterJack(...):

   def connect(self, address):
       read = lambda conn, frame: conn.router.recv(frame)
       write = lambda conn, frame: self.send_with_arbitrary_function(frame)
       return Connection(self.router, read, write, self.address, address)

Connections are practically glorified tuples, and are created on-the-fly by Jacks. Writer Jacks create them when you want to send a frame somewhere, and a Connection for it doesn't exist yet. Reader Jacks create them when you receive a message from the outside world, and don't have a Connection for that remote host yet.

MoritzS commented 11 years ago

But now I don't get what the reader and writer classes are needed for. What do they do if the Jacks create the Connections on the fly without using them?

MaddieM4 commented 11 years ago

Jacks manufacture Connections for the router to use. If the router doesn't have a connection, it has the writer jack make one. Incoming messages through the reader jack also create connections and register them with the router, so replies can use the same existing socket connection.

MoritzS commented 11 years ago

When you are talking about writer and reader jacks do you mean the same as writer and reader classes in jack modules?

MaddieM4 commented 11 years ago

Yes, exactly.

I think I may need to draw and scan a diagram to get this across unambiguously, but it sounds like we're approaching the same page :)

MoritzS commented 11 years ago

Yeah, I think a sketch will help a lot to understand this!

MaddieM4 commented 11 years ago

Okay, I ended up doing something a bit clearer. It's not done, but thanks to the nature of Google Docs, you can watch me work!

https://docs.google.com/presentation/d/1lcLTDPlcjuWvgxmYTOgX9W5xqLuPBYYA_c8vE13BGOg/edit?usp=sharing

EDIT: Done. Now I kinda want to clean up the Connection spec, since I figured out in the process of graphing everything out that the read function is really roundabout and pointless. Not sure how much conversation will sound like nonsense if I do that, though.

MoritzS commented 11 years ago

Well first I have to understand it as it is know ;-) So basically we split the old jack classes in a class that handles writing connections and another that does reading? And connections are client and address dependent, just like TCP connection sockets?

MaddieM4 commented 11 years ago

Right. I mean in reality it's less a read/write paradigm, and really a server/client paradigm (respectively) for 90% of all possible jacks, so I'm okay with making that more semantically straightforward and changing the names in this design to match that reality (you'll notice in the presentation, the jack class names were TCPServerJack and TCPClientJack).

And again, you get it. Connections are address-dependent, and each address is 2-element (jack style) instead of 3-element (client style). The router can just find the first connection with the right remote_address and call its write(frame) function to ship a frame there.

MoritzS commented 11 years ago

I had a first go at MoritzS@f162725d703ce243b49826fa79b6dcad491ba348 Basically I did not implement the reader/writer thing but RegisterJack checks if the class is subclass of ReaderJack or WriterJack. In the end it's the same but I think it's easier to understand like this. Also the writer_ignores_interface is now WriterJack.nobind and WriterJack.router_key acts accordingly. My idea of a basic implementation of a Jack is something like this:

# module MyJack
@RegisterJack('myproto')
class MyReaderJack(ReaderJack):
    def run(self):
        # [...]
        self.recv(data)
        # [...]

@RegisterJack('myproto')
class MyWriterJack(WriterJack):
    nobind = False

    def send(self, data):
        # [...]
        some_send_method(data)
        # [...]

The thing with a single class for reader and writer also works:

@RegisterJack('myproto')
class MyReaderJack(ReaderJack, WriterJack):
    # all needed methods here

Please tell me if that's what you thought of or if I'm completely on the wrong track!

MaddieM4 commented 11 years ago

Brilliant! Same concepts, much clearer implementation. You're very much on the right track, @Moritzs!

MaddieM4 commented 11 years ago

@MoritzS just checking in to see how things are going. I'm totally willing to pitch in if you're burned out, or the real world has been too demanding of your time, or whatever. Just want a rough idea where we're at, and when we can expect to see a pull request, since most other paid issues block on this.

MoritzS commented 11 years ago

I just don't have enough time at the moment, I'll try to do this either this or next week.

MaddieM4 commented 11 years ago

That's cool. And I'll try too keep adding paid issues that are orthogonal to this one, like #118 and #102, so that there's still stuff people can contribute on.

MaddieM4 commented 11 years ago

In order to keep this project moving, I'm opening it back up to everybody. Whoever has time and inclination to do it, can do it based on the discussion here. I'd love to give @MoritzS the time he needs to work in development time around the rest of his life, but not at the expense of the issues blocking on this one.

MoritzS commented 11 years ago

I never wanted other people to stop working on this, it's totally fine and even encouraging to have other people do this!

MaddieM4 commented 11 years ago

@MoritzS Agreed :) Unfortunately, we haven't seen much interest in getting this done, other than yours, so I've cranked up the bounty to $200. Hopefully I won't have to go higher than that and break the bank/trigger some godawful Paypal fraud detection thing.

MoritzS commented 11 years ago

Maybe I can start working on this again next week, we'll see.

MaddieM4 commented 11 years ago

Sounds great to me! Let me know when/if you have something to look over. On Apr 10, 2013 1:27 AM, "Moritz Sichert" notifications@github.com wrote:

Maybe I can start working on this again next week, we'll see.

— Reply to this email directly or view it on GitHubhttps://github.com/campadrenalin/EJTP-lib-python/issues/66#issuecomment-16161443 .

MoritzS commented 11 years ago

I'm working on this right now. Give me a little time to go through all our thoughts! ;-)

MaddieM4 commented 11 years ago

Sweet! I've threatened to do this myself, but I've been pretty caught up in love-webplayer and getting a proof of concept for DJDNS (which turns out to be much harder than I thought, thanks to some things I didn't realize about pymds, forcing me to make a significant fork before even being able to use it in DJDNS). You'll have plenty of time, believe me, though one way or another this will be done by the end of the month.

On Tue, May 7, 2013 at 11:38 AM, Moritz Sichert notifications@github.comwrote:

I'm working on this right now. Give me a little time to go through all our thoughts! ;-)

— Reply to this email directly or view it on GitHubhttps://github.com/campadrenalin/EJTP-lib-python/issues/66#issuecomment-17561450 .

MoritzS commented 11 years ago

I made two new commits, that I'll explain here.

MoritzS@a390e6a7d641e16470a3bca06f4b92e7ad48f8cf:

MoritzS@38776b6c6a061ecd47e059a47c43d547279c4b9f: In this commit I basically just implemented the listening for data and threading stuff for the ReaderJack.

Please tell me if that goes along with your ideas!

MaddieM4 commented 11 years ago

This looks really good so far. Clean and well-documented, and goes along with our design (at least as far as I've been able to refresh my memory, it's a long conversation :) ). Excellent work!

MoritzS commented 11 years ago

I'm currently trying to implement the new style tcp jack and got some more questions. I suppose a WriterJack should normally have set bind=False, right? For example a single tcp (or tcp6) socket can connect to multiple addresses. But should these connections be (openend and) closed right (before and) after sending a frame over it or should the stay somewhat longer to make transmission faster in some cases? Right now I don't have any idea of how to implement the persistent connection thing, so if that is what you want, I'll appreciate some input!

MaddieM4 commented 11 years ago

Jacks should hold onto the underlying sockets until they break. How they are stored and tracked is an implementation detail that's up to jack authors - in this case, you. But for the sake of example, you could just have a dict keyed on address tuple, with holds the socket objects as values.

class SomeWriterJack(...):

    def __init__(self, ...):
        # ... call parent init and such
        self.sockets = {}

    def register_socket(self, s):
        self.sockets[s.getpeername()] = s

That's actually probably a common enough storage pattern to justify implementing it in some base class, rather than redoing it over and over in the subclasses.

MoritzS commented 11 years ago

But now it seems a bit redundant to not have the WriterJack to bind to an address on the one hand but still managing different addresses within the jack on the other hand. In my opinion it would be more consistent to set bind=True in the TcpWriterJack. The router will create a new one for each address he needs, anyway. Setting bind=False makes more sense in protocols like UDP.

MaddieM4 commented 11 years ago

My understanding, which I admit may be rusty and wrong, is that bind should be true when there is a consistent local address to the connections, and false when it's not. So for TCP Writer, the local end is just going to be have some random port that was free (thus bind=False), whereas the TCP Reader binds to a specific addr/port as a server (therefor bind=True).

This is totally independent of whether a Jack keeps track of the sockets it owns. For TCP Reader, new sockets will come from remote processes connecting in. For TCP writer, new sockets will come from the router wanting to be connected to remote processes. Either way, you keep a dict of sockets, and provide the Router with Connection objects that use those sockets.

MoritzS commented 11 years ago

This might be true for ReaderJack (it also has set bind=True as default), but in my opinion it doesn't really make sense to talk about local addresses when sending data. Besides we (will) also have jacks that work totally different in comparison to IP protocols, so we should find an unified definition of what "bind" means. For ReaderJack, I assume it's kind of obvious, since the analogy to TCP server sockets makes sense even with other types of jacks. (e.g. a Sqlite jack will have to bind to a database to read incoming frames) For WriterJack it's a good idea to look at what the attribute bind was initially supposed to cause. As far as I remember its main use was to distinguish whether to return the full jack address in BaseJack.router_key or set the last field of the address to None. This will help the router to decide whether to create a new WriterJack on the fly with a new address or use a existing one. Therefore there already is a place where different addresses of the same jack type are handled, so I don't see why we should add this logic to the jacks themselves.

MaddieM4 commented 11 years ago

I think this might be a good motivation to rename the "bind" variable to something more unambiguous and descriptive, like "has_local_address". That'll help us work out the rest of this mess with less cognitive tangle.

MoritzS commented 11 years ago

Yeah that's a good idea. Apart from that, do you agree with my suggestions?

MaddieM4 commented 11 years ago

It seems like it's probably mostly right, if I understand what you're saying, but there really only needs to be one WriterJack instance, maximum, per protocol. Both kinds of jacks basically exist to create Connection objects to feed the router. While you will need a ReaderJack per address you serve on, you only ever need one WriterJack to initiate connections to the outside world. The Connection objects abstract away the transmission details for the router, so it can treat each of them as a bidirectional pipe. Does that make sense?

MaddieM4 commented 11 years ago

Also, sorry for the high latency in getting back to you on this stuff. I really want to dig into the code with you in a collaborative environment, so I can show you what I'm talking about, instead of trying to mold it into words like sun-baked playdoh, or procrastinating about replying in the hope that it'll be easier to turn the ideas into words later (which it never is).

I think the closest we could get is to both work on the same branch in parallel (moritz/new-jacks and campadrenalin/new-jacks, for example), so that when one of us is stuck or confused, the other can express the solution as mergeable code. Let me know if that sounds good to you too.

MoritzS commented 11 years ago

Yeah that's a good idea! It's better to talk about written code rather than just ideas!

MaddieM4 commented 11 years ago

Alright, cool! Happy to help.

I'll be in Philadelphia until Friday afternoon, though, and as a west-coaster, a lot of that will be on a plane (or at least will feel that way) so I'll get on this when I get back, but I'll be working on pymads during my small amounts of free time. Due to external, still-undiagnosed regressions in the build or test system, EJTP no longer builds at all on my laptop, so it's just as well that my head is in other projects while I'm living laptop-only! On May 22, 2013 2:48 AM, "Moritz Sichert" notifications@github.com wrote:

Yeah that's a good idea! It's better to talk about written code rather than just ideas!

— Reply to this email directly or view it on GitHubhttps://github.com/campadrenalin/EJTP-lib-python/issues/66#issuecomment-18268097 .

MaddieM4 commented 11 years ago

I finally got some of my Connection class ideas into the code, as well as a bunch of general cleanup, though it's still gonna be a godforsaken mess for awhile. Refer to #135 to see what I'm talking about.

MoritzS commented 11 years ago

I won't have time to look over it in detail until Sunday, but I had a quick look and I can definitely say that I understand your ideas way better now!

MaddieM4 commented 11 years ago

I know, right? Code is always clearer!

Still need to figure out how to deal with the colliding taxonomies of IPv4/IPv6, Reader/Writer, etc, especially with regards to registration. I've got half a mind to make Reader vs Writer an initialization parameter, rather than a class thing. So there's a ways to go before this is useful, and I don't think it will make this EJTP release either. But the next one? Definitely.

MoritzS commented 11 years ago

So I have some questions now. Why did you move the send and recv methods out of ReaderJack and WriterJack into BaseJack? Are Connections supposed to be client-wise (like a TCP-client) only, or should they also be able to be incoming connections (like a TCP-server), or even both at once? Besides, I really like the idea of having the router handle Connections instead of Jacks!

MaddieM4 commented 11 years ago

It's been a busy day, so I haven't had a good opportunity to give these good solid thinking, but I'll try to hammer out some thoughts now.

MoritzS commented 11 years ago

As I understand your suggestions, you still mix up the concept of Jacks, as a gateway to the internet (similar to an ethernet interface) but without further knowledge of addresses and such, and Connections, as a layer on top of the Jack that is aware of addresses, clients, servers, etc. Specifically, this means the following:

For the router this means, that it will never instantiate a Jack class directly but use the (yet to be implemented) registration system, that will look at the address, instantiate a Jack class and return a connection.

MaddieM4 commented 11 years ago

Correct. If I recall correctly, I'm at the point in development where I'm actually trying to register and use jacks. Not sure if I pushed that or just committed it.

And also, you're right, I'm confident about the role of each piece in the system but the terminology is fucked :) At least for jacks. I don't know how to improve it though, so I'm open to suggestion/wild brainstorming. On Jun 6, 2013 1:26 PM, "Moritz Sichert" notifications@github.com wrote:

As I understand your suggestions, you still mix up the concept of Jacks, as a gateway to the internet (similar to an ethernet interface) but without further knowledge of addresses and such, and Connections, as a layer on top of the Jack that is aware of addresses, clients, servers, etc. Specifically, this means the following:

  • a Jack instance should "know" how to establish connections, so
    • WriterJacks know how to establish connections, that send data
    • ReaderJacks know how to establish (i.e. listen for) connections, that read data
    • the Connection class actually cares about addresses, senders and receivers For the router this means, that it will never instantiate a Jack class directly but use the (yet to be implemented) registration system, that will look at the address, instantiate a Jack class and return a connection.

— Reply to this email directly or view it on GitHubhttps://github.com/campadrenalin/EJTP-lib-python/issues/66#issuecomment-19071776 .

MoritzS commented 11 years ago

Tell me, what you think about this mock-up!