crossbario / autobahn-python

WebSocket and WAMP in Python for Twisted and asyncio
https://crossbar.io/autobahn
MIT License
2.48k stars 769 forks source link

Autobahn shared core API #472

Closed oberstet closed 6 years ago

oberstet commented 9 years ago

As we have been discussing various improvements to the AutobahnPython WAMP API like the new Connection class and just using a WAMP Session object instead of being forced to inherit from a Session base class, now is the chance to design a core API for WAMP that is highly similar between:

oberstet commented 9 years ago

See also:

oberstet commented 9 years ago

Here is my current favorite sketch:

from twisted.internet import reactor
from autobahn.twisted.wamp import Connection

connection = Connection('ws://localhost:8080/ws', u'realm1')

@inlineCallbacks
def on_open(session):
    print('session connected: {}'.format(session))
    connection.close()

connection.on('open', on_open)

connection.open()

reactor.run()

This would establish a WAMP-over-WebSocket connection and attaching a session to a realm. When the session has joined, it prints and then closes the connection.

Above would not allow the session to leave and join the same or another realm while the underlying connection stays. As the on_open event on Connection would only fire when the session has joined the realm.

oberstet commented 9 years ago

Here is a bigger example with multiple transports, automatic reconnection and authentication:

from twisted.internet import reactor
from twisted.internet.defer import inlineCallbacks

from autobahn.wamp.auth import compute_wcs
from autobahn.twisted.wamp import Connection

# list of transports to try
transports = [
    {
        'type': 'rawsocket',
        'endpoint': {
            'type': 'unix',
            'path': '/tmp/cb-raw',
        }
    },
    {
        'type': 'websocket',
        'endpoint': {
            'type': 'tcp',
            'host': 'localhost',
            'port': 8080
        },
        'url': 'ws://localhost:8080/ws'
    }
]

# list of credentials to use for authentication
credentials = [{
    'type': 'wampcra',
    'id': 'user1',
    'on_challenge': lambda challenge: return compute_wcs('password1', challenge)
}]

# options that control the connection's behavior
options = {
    # http://autobahn.ws/js/reference.html#connection-options
    'max_retries': 5
}

# create a connection object with all the bells we defined before
connection = Connection(transports, u'realm1', credentials=credentials,
    extra=None, options=options, reactor=reactor)

# when the connection has established a transport, created an ISession object
# and the latter has joined a realm, this will fire
@inlineCallbacks
def on_open(session):
    print('session connected: {}'.format(session))

    # app code now just uses the session object for WAMP
    try:
        res = yield session.call("com.example.add2", 2, 3)
        print('got result: {}'.format(result))
    except Exception as e:
        print('something bad: {}'.format(e))

    connection.close()

# when the connection closes or goes through a reconnect cycle, this event fires
def on_close(reason):
    print('session closed: {}'.format(reason))

# attach our event handlers to the connection object
connection.on('open', on_open)
connection.on('close', on_close)

# now actually open the connection
connection.open()

# enter event loop
reactor.run()
meejah commented 9 years ago

Another sketch, with the listener-API on the ApplicationSession instances itself:

from twisted.internet.task import react
from twisted.internet.defer import inlineCallbacks, Deferred
from twisted.internet.endpoints import clientFromString

def one_off_function(*args, **kw):
    print("one_off_function", args, kw)

@inlineCallbacks
def register_api(session):
    yield session.register(one_off_function, u"com.example.api.v0.one_off_function")
    # ...                                                                                                           

@inlineCallbacks
def register_management_api(session):
    yield session.register(lambda: None, u"com.example.private.do_nothing")
    # ...                                                                                                           

def main(reactor):
    session = ApplicationSession()
    # or "session = MyComponent()" if you prefer subclassing to composition                                         
    connection = Connection(session, [
        {
            "type": "rawsocket",
            "endpoint": clientFromString("tor:timaq4ygg2iegci7.onion:9443"),
        }
    ])

    session.on.open(register_api)  # or session.on('open', register_api)                                            
    if True:
        session.on.open(register_management_api)
    session.on.ready(lambda session: session.publish(u"com.example.api.v0.ready"))
    reactor.callLater(10, connection.close)
    # .open will errback if connect (or re-connect) fails; otherwise callback w/ None on clean disconnect
    return connection.open()

if __name__ == '__main__':
    react(main)
oberstet commented 9 years ago

Regarding what events we want to expose (independent of where we expose those events) - I dug out our Slack conversation @meejah

so the flow would be: "connect" when the transport is connected, "join" when we've authenticated (but not yet run onJoin), "ready" when onJoin has run and completed any async-stuff. then "leave" when the session closes, and "disconnect" when the transport goes away

meejah commented 9 years ago

Further to @oberstet's last comment above, "any async stuff" includes async-returns from previous handlers. For example, if there are 3 'join' listeners that return Deferreds, "ready" shouldn't fire until those 3 Deferreds are done.

oberstet commented 9 years ago
    session = ApplicationSession()
    # or "session = MyComponent()" if you prefer subclassing to composition    

Have been looking at these 2 lines for minutes .. and I know what slightly irritates me:

The word "session" - for me - implies some ephemeral thing that is created for me, and which I then use. In above I create the session, and it's not ephemeral. It exists without any transport existing yet, and without having joined a realm.

The funny thing is your commented line: session = MyComponent()

Note the mix of "session" and "component": https://github.com/tavendo/AutobahnPython/issues/383

In my perception, the word "component" is much more in line with the user creating the thing ("component like") rather than it being created for the user ("session like").

oberstet commented 9 years ago

Another aspect. At the WAMP protocol level, there are indeed "session" and "transport".

A session rides on top of a transport. And a transport can outlive a session: session 1 ends, and without closing the underlying transport, a new session 2 can start. The new session will get a new WAMP session ID though!

However, when a transport goes away, the session is gone too. A WAMP session (today) cannot outlive a transport (but: https://github.com/crossbario/crossbar/issues/300).

If we had a user precreate

session = ApplicationSession()

the lifetime of that Python object hasn't a direct correspondence with "session" in the WAMP protocol sense. I think this discrepancy might be confusing.

meejah commented 9 years ago

Component vs Session: yeah, I can see the confusion; was trying to copy the examples usage. Perhaps there's no value in even "letting" people choose between composition and inheritance? (As in: no inheritance allowed in new API?). I think there is value in leaving that ability in (even if the methods are re-named to PEP8 like on_join vs. onJoin etc) since presumably some people will still enjoy that style...?

Anyway, for the composition-only case, the session could just be created + owned by Connection and you'd either do connection.session.on('join', ...) and/or have Connection provide pass-through calls for convenience).

There's also an interesting edge-case to consider: what if an event has already happened, and a new listener is subscribed? Should it get immediately invoked (e.g. like when you .addCallbacks to an already-completed Deferred)? This is probably most-relevant for join or ready. Depending on documentation admonishing people to only add join listeners before .open() is called might not work ;)

meejah commented 9 years ago

Lifecycle-wise: in some ways it does seem "more pure" to have the ApplicationSession lifetime match that of the logical WAMP session (i.e. if you re-connect a session on the same transport, you get a new ApplicationSession instance) -- if you're talking about getting rid of the existing cycle in the WAMP state-machine (e.g. that onJoin can be called > 1 time) that overall might be much cleaner. However, then you'd likely want/need a "session_created"-type event from Connection? (i.e. distinct from, and strictly always occurring before the onJoin)

Also, if there is a Connection.session then does user-code have to guard against .session being None (or, "no WAMP session established just now") in such a scenario?

All that said, the existing WAMP state-machine does have a "not [yet] connected" state and so from that perspective having a single WAMP "session object" that can join/leave and then join again isn't so outlandish...

oberstet commented 9 years ago

@meejah one thing we need to keep in mind: we need an approach which allows users to take unmodified code to run hosted under CB.

That means, the user code needs to be wrapped inside a class (as today with ApplicationSession) or ..?

meejah commented 9 years ago

Hmm. Yeah to host under CB right now with 'class' things you'd need an ApplicationSession subclass I guess.

You could run the other ("listener"?)-style code as a guest worker right now.

I suppose the equivalent we'd need in CB is a "function" (router + container) worker which imports a particular function that takes a session (and then adds the listeners it needs etc). So you'd have to write your code with that in mind I guess ...

oberstet commented 9 years ago

Whatever we come up with recommended 80% API in AB must work without changing any code when the component is taken to run under CB. If that can only be achieved via subclassing (the current ApplicationSession approach), then we'll stick to that.

meejah commented 9 years ago

Yeah, that's a good point; have to think how best that fits in with a "composition"-style thing.

Another (but maybe more complicated-sounding) option would be that CB provides configuration so you can specify the listeners directly, something like:

{
    "type": "function",
    "on_join": ["package.register_api", "package.register_management_api"],
    "on_leave": ["package.sub.on_leave"],
}
oberstet commented 9 years ago

That last idea points to bigger issues with a non-inheritance based approach: having functions like above in a module means they are forced to hold state in a module global var (or misuse the provided generic WAMP session/connection object to attach app state).

Sidenote: me, coming from C++, this is a total no go. In Py, it seems an accepted approach (Django, Flask, ...). It's kind of replicating the mistakes of GIL and the failure to correctly wrap all interpreter state (like all sane, embeddable VMs do .. Lua, V8, ..) in a non-global at the app level. I don't want to encourage that.

Now, the problem with having state in globals in above situation: what I run multiple instances of my component in a single worker (router/container)? Doesn't work.

Then: having it at that level of detail in the CB config means: essentially mixing CB node config with app internals. A CB admin would not and should know about this in detail.

meejah commented 9 years ago

Yeah, you would have to attach any state to the provided session instance, IMO.

Putting any kind of state -- even "singleton"-pattern stuff -- in a module isn't very good (even if it is used occasionally).

However, this isn't very different from the "override" case, where essentially you have to initialize all your state in onJoin or the class will fail to work properly if/when you re-join.

So the recommendation I think doesn't change: you initialize any state you need into the session object, and you should do this in onJoin (or your join listener).

meejah commented 9 years ago

...but yes in a "non-dynamic" language (C++) it's not immediately obvious what a good solution is for things needing state (aka "instance variables") since they'd need to be declared (i.e. sub-classing).

oberstet commented 9 years ago

slightly related: CB deliberately does NOT provide the ApplicationSession instance a variable that would be shared across multiple components. Because then, you couldn't freely move around components deployment wise. One should be able to run 2 components in 1 container worker or 2 container workers without changing a single line. And this works today, since any inter-component communication MUST be via WAMP. When the component co-reside in a single worker, those WAMP interactions boil down to mere function calls, whereas when they don't co-reside, it'll be WAMP over the wire (where wire could be a local Unix domain socket, or a wire to the other end of the world). We definitely don't want to loose that deployment transparency. It's part of the original vision of WAMP and CB. And I don't know other systems that allow that in that extreme way.

meejah commented 9 years ago

Maybe I'm using the wrong term, I didn't mean a "shared state" instance variable, I just meant that you can't magically create a thing in C++ via e.g. this->something_new = ... like in Python or JS so I guess a C++ implementation would need subclassing and have to at least recommend "re-initialize in onJoin". The only alternative I can image is for the C++ ApplicationSession implementation to provide a hash-map or something as this->state but that's ... really hacky.

meejah commented 9 years ago

Perhaps a way to reconcile "state" between dynamic (JS, Python) and not (C, C++) without resorting to subclassing is to add a "state" attribute to the ISession / ApplicationSession public API?

In C this can be a void* and you can do what you want with it (maybe C++ too, or perhaps something fancy with templates so its got a type). For Python/JS this can just be None/null by default, and you can assign whatever you want.

Encouraging this explicit encapsulation of session-state might help too with session freeze/thaw: we could say "if the 'state' object implements ISerializable [or whatever] then your session can be frozen/unfrozen".

sametmax commented 9 years ago

Some notes I got in mind:

Let's be careful to keep congruent signatures

E.G, currently in Python you got:

self.register(func, 'name');

But in JS you got:

session.register('name', func);

An no error if you mix them up.

I have several painful memories because of it.

We should have sane default

When you connect to redis on Python, you can do:

import redis
connection = redis.Redis()

You don't need to do:

import redis
connection = redis.Redis(host="localhost", port=6379)

It relies on sane officially documented default.

For autobahn and crossbar, it means it we should be able to do:

connection = Connection()

And choose some default, make it very clear in the doc and stick to it.

Especially, we need to define an official default port (not 8080, which is often used by dev server) and an official default realm. "realm1" is not a really good name. "public" would be better, since all tutorials use it like some kind of chmod 777 anyway, and it's easier to remember.

It will also make OS packager's life easier. Having a default port is good for them.

Some stuff are better if not shared between languages

We should definitely have a common API. But since some constructs are idiomatic in some languages, it is important to provide them as well.

For Python, it's decorators:

@inlineCallbacks
def on_open(session):
    print('session connected: {}'.format(session))
    connection.close()

connection.on('open', on_open)

Is a nice API. But since you are already using a decorator, this will be handy:

@connection.on("open")
def on_open(session):
    print('session connected: {}'.format(session))
    connection.close()

The good news is that we don't have to choose between them, we can do both:

class Connection:
    ...

    def add_event_listener(event, callback):
        # do the real work here

    def on(event, callback=None):

        # classic form
        if callback:
            return self.add_event_listener(callback)

        # decorator form
        else:
            def decorator(func):
                cb = inlineCallbacks(func)
                return self.add_event_listener(event, cb)

            return decorator

For JS, it's promises:

connection.open().then(function(){}

)

This will make JS coders more "at home".

You still should totally have the ordinary common syntax. But have these as well.

Distinguish connection failing and closing

Currently in the JS API it's the same event, and you check a status in a variable to know what's really happening. A wrapper checking the status an trigerring explicitly named event would make things more obvious.

Open should start the event loop

In JS the event loop is implicit. To give the same feeling of easyness, connection.open() should do reactor.fun() (or the asyncio equivalent), but have a "start_event_loop" parameter with a True default value, in case somebody wants more control.

Indeed, we are aiming to facilitate the beginer use case, who will want to start as quickly and easily as possible. Advanced used are the one which should need additional work.

I know that the strength of Twisted is that you can use many different modules and plug them together in the same reactor, but this will not be the objective of people trying WAMP. They will not think "let's write a custom protocol and plug in an IRC bridge with my websocket website" on the first day. Nor on the second.

For them, the event loop we are using is an implementation detail. Exposing it should be explicitly requested with code, such as a "start_event_loop" parameter.

Prevent importing stuff you need all the time and abstract some more

If you use some things really often, they should be aliased in a namespace that is always imported and be abstracted away to hide differences between event loop.

Stuff line \@inlineCallbacks, \@coroutine or returnValue should have wrappers, and the wrappers should be attached either to session (or an object containing session passed instead of session) or connection.

Importing them all the time and having to know about the difference between the event loops is only useful for advanced usages. You can create decent websites and app without even needing to understand perfectly the event loop, just that you should not block and where to setup your "yield".

The idea remains :

Check if name == "main" implicitly

Libs like begins started to do this with decorator like @begin.start.

Instead of calling connection.open(), you have some method that trigger parameters parsing and command running only if the module is not imported.

class Connection:
    ...

    def cmd(params):
        # argparse stuff here checking command line params
        # stuff checking env var params
        # all that is merge with params passed as arguments
        if __name__ == "__main__":
            return self.open(merged_params)
        return merged_params

But the more you do that, the more putting this code on a Connection object seems a bit weird. That's why last year I suggested the App centered

Final API

If you do all this, the hello world becomes:

from autobahn.twisted.wamp import Connection

connection = Connection()

@connection.on('open')
def on_open(session):
    print('session connected: {}'.format(session))
    connection.close()

connection.cmd()

That's easy. That's clean. That calls for a copy / paste to try it right away !

But I'd still advice to use something similar to the app API, plus this allow to override call and make it easy to be embeded in crossbar.

meejah commented 9 years ago

@sametmax for JS, I don't think we can use Promises since a session can open more than once. (e.g. similarly it might be nice in Twisted to make .on_open a Deferred so you could .on_open.callback(...)). But, a Deferred can only callback once. I presume the same is true for JS Promises (I can't tell one way or another for sure from MDN docs).

So I think an "idiomatic" JS might just be: connection.on('open', function(session) {...})

p.s. I like the decorator suggestion for Python.

meejah commented 9 years ago

IMO, Connection shouldn't do anything with arg-parsing, __main__, starting the event-loop or anything like that.

Those things can/should go in ApplicationRunner.

At least for Twisted users, an "implicit" event loop start is hard to deal with if you want to do anything else, and leads to e.g. the somewhat awkward start_reactor=False stuff from the current ApplicationRunner. Ditto for starting logging. The recommended way for a non-daemon app to start in Twisted is via twisted.internet.task.react which won't work with an implicitly-started reactor in .open.

I'm looking at it this way:

sametmax commented 9 years ago

IMO, Connection shouldn't do anything with arg-parsing, main, starting the event-loop or anything like that.

The API I'm describing only feels weird because it's tied to the Connection object, which is not the proper level of abstraction and is the wrong root object. I'm more in favour of something like Application (https://github.com/tavendo/AutobahnPython/blob/master/autobahn/twisted/wamp.py#L322). It's a half baked API, and needs to be improved as well, but it is way easier to understand that the other one : I'm making regular introduction to WAMP with Pythond dev around me, and it speaks to them much more than ApplicationSession.

At least for Twisted users, an "implicit" event loop start is hard to deal with if you want to do anything else, and leads to e.g. the somewhat awkward start_reactor=False stuff from the current ApplicationRunner. Ditto for starting logging. The recommended way for a non-daemon app to start in Twisted is via twisted.internet.task.react which won't work with an implicitly-started reactor in .open.

You are looking that from the eye of purity instead of the ease of use.

"start_reactor" is only awkward from the perspective of a Twisted users. The reactor is the awkward thing for the vast rest of the Python world. Twisted users ARE NOT you user base. Twisted is an implementation detail. I repeat, Twisted is an implementation detail.

ApplicationRunner is very ugly and verbose. Yes it's flexible. Yes it's clean. But when you see it the first time it scream Java, not Python. Having an API that is exposed in the style of the language you use makes it sexy.

With the request lib, you can create a Session object to use complexes tools. But you don't need to. You can just import requests and play. Same with flask: it even goes to the length of making some useful variables global singletons that internally switch their context using threadlocals to make it easy to use. I think it dirty and too magic, but people like it.

I don't advocate going to the same lengths, but let's inspire ourself from their desire to simplify the first impression and the 80/20 part of the API.

E.G: you can provide a lower level API so one can use it with twisted.internet.task.react in the rare case one does want it. But it will be rare because when crossbar will be more mainstream, people will be mainly interested in the WAMP part, not the twisted part. Like I said, the event loop is an implementation details. Let power user access the internals to remain flexible, but make an unpure, non-perfect, hiding-the-truth, lying-to-you, look-how-easy-it-is API. Remember that the power users will be the minority of the user base when the project is a success.

A nobrainer API leads to:

It's good for the project maintenance, and it's good PR. But above all, when we will be writing an app for the 100th time, we will be happy it's so easy to do it.

meejah commented 9 years ago

I don't disagree with your aims at all, those are really good.

Perhaps instead of ApplicationRunner what we should do is add a method that "runs a Connection". This would be equivalent to your example code but not clutter Connection with trying to abstract two different things at once.

That is, something like:

from autobahn.twisted import wamp

connection = wamp.Connection()

@connection.on('open')
def register_api(session):
    yield session.register(...)

wamp.run(connection)

# also, Twisted users will be happy that they can do:
#     react(connection.open)
# or:
#     connection0.open(reactor)
#     connection1.open(reactor)
#     connection2.open(reactor)
#     custom_thing.start(reactor)
#     reactor.listenSSL(...)
#     reactor.run()
#     reactor.run()

...and wamp.run() would exit when the underlying connection.open is "done" (that is, it was either explicitly close()'d or ran out of retry/re-connects or encountered an error). For bonus points, .run() could take a list/iterable of Connections as well.

I'm not sure sure about "implicitly"/magically making an @connect.on-decorated thing an @inlineCallback/coroutine... On the other hand it does make the twisted vs. asyncio switch just an import change. But on the other-other hand you do need to be aware of what the event-loop does, and some basics of event-based programming to "make it go" in any meaningful sense. When you put a "return 'foo'" in the above register_api and it blows up, I think both novice and seasoned Twisted programmers will be confused...?

meejah commented 9 years ago

Is it possible for connection.on to detect if the thing it's decorating "should" be wrapped with inlineCallbacks? Not 100% reliable, but: if it's a generator and not already @inlineCallbacks-decorated does that work? (Because it would never make sense for an @connection.on-decorated thing to actually be a generator, so ...) Then I think I could get behind this idea.

sehoffmann commented 9 years ago

Am I the only one who doesn't like the idea of such "sorcery"? It's not really apparent from the code that @connection.on suddenly does the same job as @inlineCallbacks and its semantically wrong as well. Why not a simple alias, e.g: if twisted: wamp.coroutine = twisted.inlineCallbacks else: wamp.coroutine = asyncio.coroutine?

sametmax commented 9 years ago

@meejah: it's better indeed. Plus it probably make testing easier as well.

When you put a "return 'foo'" in the above register_api and it blows up, I think both novice and seasoned Twisted programmers will be confused...?

This will happen even with inlineCallbacks : nobody remembers that you can't use "return" in a generator. Hopefully, the message is explicit: it's just a syntax error. As for applying inlineCallbacks automatically, we can use inspect.isgeneratorfunction() to apply it at run time on generators only, which is what we do currently with Application.

So the "solution" is:

@Paranaix: to be fair, nobody outside of twisted user base knows what inlineCallbacks does or why it's needed. What about making the alias you advice, and allow the ordinnary callback form, but still provide a magic decorator for day-to-day use. Honestly, even I will rarely use it without inlineCallback, and importing it and setting it up for every function will get old really fast.

meejah commented 9 years ago

I think I still lean towards "magic" decoration being "too magic".

My main justification is: anyone doing async code (asyncio or Twisted) will have to learn what @coroutine/@inlineCallbacks does (and why it's needed) or they won't be able to write async methods properly. The target audience for AutobahnPython is Python developers, and they have to (explicitly) choose a framework to run on (asyncio/Twisted) and will have to learn some minimum details about how it works.

I can see the appeal of making the code "look" cleaner/simpler, but for me one of the big advantages of async/event-based code in the first place is controlling where your code gives up control (vs. threading where that place is "at literally any statement").

I'm willing to be convinced "it's fine, some magic is good", but for now I'm unconvinced...

@Paranaix I share you concerns. I also don't really like the idea of just aliasing it, because you already have to decide "asyncio" or "twisted" when doing the autobahn imports anyway (and experienced asyncio/Twisted devs will be looking for asyncio.coroutine or inlineCallbacks -- and again both be confused).

There's also a reason I put the .on listener API on ApplicationSession and not Connection: there are already a couple places in crossbar and autobahn code where some code "happens upon" a session and needs to listen for an event (e.g. add a cleanup handler during onLeave). Same for Connection itself for any retry/re-connect logic. That doesn't necessarily mean that Connection can't have a listener-style API as well (even "forwarding" the same one?) but I do think that "session" is the right place for join, leave, ready at least. connect, disconnect sound more "transport-y" but can also be on Session for simplicity?

Also, looking again at @sametmax's example code, it introduces an "open" event -- maybe this is the right listener-API for Connection? That is, open and close which receive a Connection instance (although I guess that's redundant since you need a "connection" in-scope to use the decorator anyway). Of course connection.session would be valid at that point if you wanted to access it directly. I guess "open" would be called after the session has both connected and joined (and get re-called if the session re-connected), and close would be called after a session has left and the transport has disconnected...? In that case, why does the "open" even need an arg at all? It could be:

from autobahn.twisted import wamp

connection = wamp.Connection()
@connection.on('open')
@inlineCallbacks
def register_api(connection):
    yield connection.session.register(...)
    # you could add "inline" listeners if you prefer
    def left(session):
        print("Our API session '{}' left.".format(session))
    connection.session.on('leave', left)

# you can organize things logically
if config.enable_management:
    @connection.on('open')
    @inlineCallbacks
    def register_management_api(connection):
        yield connection.session.register(...)

# you can still add listeners directly on the session if you want
@connection.session.on('disconnect')
def gone(session):
    print("Session '{}' disconnected".format(session))

wamp.run(connection)
sehoffmann commented 9 years ago

@meejah Regarding aliasing: Autobahn already tries to abstract away the differences between twisted and asyncio in order to improve maintainability. However while that philosophy his right IMO, I never understood why Autobahn uses the current approach.Truth be told, it's already good, but one can do better. If I currently want to migrate my codebase, I have to change EVERY SINGLE IMPORT. It would be much more simpler to just always import a single abstract class which actual "implementation" becomes assigned based on e.g a global module variable. To migrate one simply has to change a single line now. That would basically be what txaio is already doing (as far as I have seen the changes).

meejah commented 9 years ago

Being a bit of a devil's advocate here: one thing I really dislike about e.g. Flask's "app.route"-style decorators (which the "connection.on" thing is similar to above) is that it encourages writing modules that have an "app" global in them -- which is really hard to test.

I do like decorators, but wonder if they don't work better as markers for classes/methods/functions (not completely unlike the "wamp.register" etc. ones that exist now). Consider what happens when you go from "tiny example" code like above and move it into modules -- to keep the "decorator" syntax you either have to put everything above inside a function, like:

def setup(connection):
    @connection.on('open')
    def register_api(conn):
        conn.session.register(...)

...or you end up creating a Connection instance when you import the module. Either of the above become hard to test; the first because of the implicit Connection and the second because everything is hidden under a single method call.

On the other hand, when connection.on does not act like a decorator, you're sort of "forced" to write bare methods -- which are way easier to test. I still think decorators are a great way to expose things, but maybe we're reaching to overload the .on listener-API to do both...?

That is, I think considering a cleaned-up version of the "wamp.{register,subscribe}" decorator-API might be a better approach; this leads to code more like this (I'm deliberately leaving out args etc; I'm more interested in the overall layout here):

# this lives in some module, say example_api.py

@wamp.prefix(u"com.example.api.v0")
class PublicAccess(object):
    @wamp.register(...)  # maybe no args; figure out from method name?
    def sign_up(...):
        ok = yield check_user(...)
        if ok:
            yield do_signup
    # obviously, other methods etc. possible

def register_api(session):
    database = yield open_db_connection()  # better style to pass "db" in, but ...
    api = PublicAccess(database)
    yield session.register(api)
    session.on('disconnect', lambda _: database.close())

...and then in your "main" program is where you'd do something like:

from autobahn.twisted import wamp
from example_api import register_api

def main(...):
    connection = wamp.Connection(...)
    connection.on('open', register_api)
    wamp.run(connection)

main()

Personally, this is more the style I think of when wanting to "not subclass" ApplicationSession. The "interesting" methods are pretty easy to test (e.g. you instantiate PublicAccess() giving it a fake database connection, and call its methods) and even "boring"/boilerplate-y methods like "register_api" are fairly easy as well (e.g. give it a fake session instance). That is, something along these lines:


def test_unauthorized_user():
    db = fake_in_memory_db_with_no_users()
    api = PublicAccess(db)

    api.sign_up()  # should fail
meejah commented 9 years ago

@Paranaix What txaio is doing is basically allowing Autobahn code to be (somewhat) shared between Twisted and asyncio. That is, Autobahn maintainers use reduced functionality in order to provide a smaller code-base that works across platforms.

For user code, it should have the full power of the chosen API -- that is, you get to use "raw" Deferreds, DeferredList, inlineCallbacks, etc (or Future, coroutine, etc) in the client code. Also the styles are different: asyncio doesn't let you chain callbacks, doesn't have errback, doesn't let you cancel errbacks. The error-handing especially takes a very different approach in both.

So, I'm not sure that it's very possible -- or desirable -- for users of Autobahn to have a "one click" switch between asyncio/Twisted. That would mean, e.g., all client code would have to use txaio which is a "least common denominator" kind of API -- it works, but it is not as powerful or elegant as doing async code in either Twisted or asyncio "directly".

As it sits right now, you can/should be happy as a Twisted user in thinking "this whole library was written in Twisted!" and you can/should be happy as an asyncio user in thinking "this whole library was written in asyncio!". That is, it should work as if it was...but, I think projects should still choose which one to use (e.g. Crossbar.io uses Autobahn, but is written "in Twisted" -- and it would be far more pain to write Crossbar.io such that it worked on asyncio as well).

oberstet commented 9 years ago

@sametmax @Paranaix and all: whoaa. =) thanks a lot! there are many important points and ideas here. It'll take some time to sort everything out. But it's worth: we really need to get this nailed now and have a stable long-term API.

Anyway, first point of @sametmax :

Let's be careful to keep congruent signatures

Yes. I agree. This needs to be fixed, as it turned out to be a permanent source of annoyance. In particular when switching between Py and JS .. very common with frontends being browsers nowerdays most of the time.

Here is historical context: Python has decorators, JS not. The current Py API was designed to allow this:

@wamp.register(u"com.example.foo")
def foo()
    return u"bar"

session.register(foo)

If we bring the Py API in line with JS, there are 2 options:

register() -> Option 1

session.register(None, foo)

or

register() -> Option 2

session.register(foo) and then allow polymorphism on the first argument of .register(arg1): arg1 can be a callable, which then MUST be decorated, or it is a string, but then there must be a second arg2 with a (non-decorated) callable.

Having polymorphism on the public API .. makes me feel uneasy. But I have no strong opinion on Option 1 or Option 2.

oberstet commented 9 years ago

2nd point:

We should have sane defaults

"realm1" is not a really good name. "public" would be better, since all tutorials use it like some kind of chmod 777 anyway

The idea with "realm1" was that it immediately signals this is something "user defined".

But I can follow the argument for "public" also: it immediately signals the default permissions (777) we practically use with this in all examples.

The trap would then be: you can of course have a realm named "public" with very strict permissions;)

Also: "realm1" is much easier to grep for.

I have no strong opinion on the tradeoffs above.

we need to define an official default port (not 8080

Having defaults: ok. Np. Host == localhost is a no brainer.

However, the default port, if any, should be 8080 IMO. At least when the default transport is WAMP-over-WebSocket. It's HTTP in the beginning. Means: 80 or 8080 would be "expected". Port 80 is a problem on Unix (the braindead <1024 port restriction).

Maybe these defaults:

oberstet commented 9 years ago

Ok, here are a couple of ones where I do have a strong opinion (note: I will continue with all your @sametmax points tomorrow .. the other I need to think more, and now I need some sleep):

Open should start the event loop

No. We can't do that, as e.g. in Crossbar containers, the event loop is started by the native worker.

This is the reason why there is ApplicationRunner: it does the plumbing when there is no hosting app/container around.

In JS the event loop is implicit.

This is correct for browsers and NodeJS runtimes .. which granted are the only relevant ones. And while Twisted does not allow to run multiple event loops (like the JS run-times mentioned before), asyncio does indeed. That's advanced, but consequent. The fact that Twisted requires explicit reactor start, but then does not allow to start multiple reactors is of course the worst of all I agree. Well. The likeliness that gets fixed is still higher than Python getting rid of the GIL;)

Prevent importing stuff you need all the time and abstract some more

In general, no. IMO too much magic hurts. And auto-importing and aliasing in particular is a great source of hard to track down problems.

Stuff line \@inlineCallbacks, \@coroutine or returnValue should have wrappers, and the wrappers should be attached either to session (or an object containing session passed instead of session) or connection.

You simply can't write source code that runs unmodified across Py2/3 and tx/aio using co-routine style (the mechanisms under the hood for co-routine style are different). At least I don't know a way, and I haven't seen anything like it. You can write such code only without using co-routine style, using plain deferred/future, and without deferred chaining. This is what AB does internally, and it does that using txaio. Hence, there is no point in trying to alias away \@inlineCallbacks and \@coroutine.

Check if name == "main" implicitly

No, argument parsing isn't something a WAMP component should be concerned with. It is the startup code that should do that. Plus: when the component is run under CB, there is no point in parsing any command line. If any, a component's custom config is provided to the component by CB.

meejah commented 9 years ago

Another point regarding @coroutine: if you're on Python3 you have to type "yield from" and "return", and on Python 2 it's "yield" and "returnValue"...

sametmax commented 9 years ago

I think we should define the objectives of the API first, because all the proposal seems to tackle different objectives.

There are no wrong answers to these, only clear statements to make.

oberstet commented 9 years ago

@sametmax

Defining objectives first .. absolutely. Makes sense. Here is my (current) view:

  1. API Level: The goal would be a single, recommended high-level API. The API should have a minimal surface, and ideally be shared as much as possible across the different Autobahn's.
  2. Target: User must have entry level knowledge of at least Twisted or asyncio
  3. One way or multiple: There should be one way that works for most use cases
  4. How much time: given the user has entry level knowledge of asynchronous programming and Twisted or asyncio, very little time
  5. What use cases should be obvious: all of the WAMP 4 actions / 2 patterns should be "natural" (principle of least surprise) without reading much/any docs
  6. Level of magic: not a lot (explicit is better than implicit)
  7. Hide differences between Py2/3: Not possible at the indented API level (see txaio for the maximum possible .. but that is too low level for most)
  8. Message: Decoupling. Separation of concerns.
  9. Imagine typical app: the idiomatic app would consist of a set of loosely coupled components, talking over WAMP (where the transport might reduce to function calls when components co-reside in a router/container)
  10. Integration with other components: not sure, you mean eg Django? Answer on that would be not AB, but CB REST Bridge services. Direct integration with blocking, synchronous code is of secondary concern. It just doesn't work nicely / it's always a hack.
  11. On what parts are we working: Connection and Session plus plumbing (the role filled by ApplicationRunner today .. but we could screw that)
oberstet commented 9 years ago

I am posting example code resulting from a discussion with @meejah on Slack. This is how a complete working example would look like under that approach:

from twisted.internet import reactor
from autobahn.twisted.wamp import Connection

def main(connection):

    @inlineCallbacks
    def on_join(session):
        res = yield session.call(u'com.example.add2', 2, 3)
        print(res)

    connection.on('join', on_join)

if __name__ == '__main__':
    connection = Connection(main)
    connection.connect(reactor)
    reactor.run()

We've been talking about the design space:

What we have today is subclassing (ApplicationSession) + callbacks (ApplicationSession.onJoin etc).

Above is doing composition (it just uses a session) + listeners (connection.on('join', on_join).

Above assumes defaults like @sametmax suggested: WAMP-over-WebSocket connecting to ws://127.0.0.1:8080/ws on realm public, but would allow configurable transports including auto-reconnect and all that by providing appropriate config to the Connection constructor.

Above user code (the stuff inside main()) would run unmodified under CB using a config along these lines:

"components": [
    {
        "type": "function",
        "entrypoint": "myapp.main",
        "realm": "realm1",
        "transports": [
            {
                "type": "websocket",
                "endpoint": {
                    "type": "tcp",
                    "host": "127.0.0.1",
                    "port": 8080
                },
                "url": "ws://127.0.0.1:8080/ws"
            }
        ]
    }
]

I am cautiously optimistic that above would satisfy most of the goals we have. But this needs to be investigated further.

oberstet commented 9 years ago

Here is a slight update of the API proposal:

from autobahn.twisted.wamp import Connection

def on_join(session):
    print('session connected: {}'.format(session))
    session.leave()

def main(connection):
    connection.on_join(on_join)

if __init__ == __main__:
    connection = Connection()
    connection.run(main)

Slightly larger example which would run unmodified over Py2/3 and twisted/asyncio apart from changing the imports:

from __future__ import print_function

from txaio.twisted import add_callbacks
from autobahn.twisted.wamp import Connection

def on_join(session):
    result = session.call(u'com.example.myapp.add2', 2, 3)
    add_callbacks(result, print, print)

def main(connection):
    connection.on_join(on_join)

if __init__ == __main__:
    connection = Connection()

    finish = connection.run(main)
    add_callbacks(finish, print, print)

In above, Connection would take an optional reactor or loop argument. When none is given, the default one is used and automatically started in .run(). When a reactor/loop is provided, the user has to start the reactor/loop explicitly. I think this was a suggestion of @sametmax

I am gonna prototype that now and see how it works.

oberstet commented 9 years ago

Regarding prerequisite know-how .. there are these areas of knowledge a programmer can have:

  1. synchronous programming (and only that)
  2. using callbacks
  3. deferreds/futures
  4. co-routines

E.g. every JS programmer knows about 2., and 3. is growing in the JS community. ES6 brings native promises to JS. Co-routines seem to be in the cooking for ES7 (see here).

With Python, both Twisted and asyncio propagate not 3. and 4. It is true that a considerable fraction of Python programmers might only know about 1. - but there isn't anything we can do about in AutobahnPython about it.

sametmax commented 9 years ago

It's clean and easier than before, but we need to figure out how to allow a clean deorator syntax version, since right now you end up with:

    from autobahn.twisted.wamp import Connection

    connection = Connection()

    @connection.run
    def main(connection):

        @connection.on_join
        def on_join(session):

            @session.register("foo.bar") # not defined in our previous example
            def _():
                return "foobar"

Let's do this API, but add in the mix an additional object : a registry to collect all callbacks and which will automatically create something like "main()" for you:

class CallbackList:

    def __init__(self):
        self.procedures = []
        self.subcribers = []

    def register(self, name, args, *kwargs):
        def wrapper(func):
            self.procedures.append(name, func, *args, *kwargs)
        return wrapper

    def subscribe(self, topic, args, *kwargs):
        def wrapper(func):
            self.procedures.append(topic, func, *args, *kwargs)
        return wrapper

    # this is the main() equivalent 
    def run_all(self, connection):

        def on_join(session):
            for name, cb, args, kwargs in self.procedures:
                session.register(name, cb, *args, *kwargs)

            for topic, cb, args, kwargs in self.subscribers:
                session.subscribe(topic, cb, *args, *kwargs)

        connection.on_join(on_join)

        return main

Which will allow:

    from autobahn.twisted.wamp import Connection, CallbackList

    connection, cb = Connection(), CallbackList()

    @cb.register("foo.bar") # not defined in our previous example
    def _():
        return "foobar"

    @cb.subscribe("doh") # not defined in our previous example
    def _():
        print('doh')

    connection.run(cb.run_all)

But even then, I'm a bit confused with "run()". We call it open() on JS, so maybe we should call it the same way in Python. And if not, it means they are different and we need to provide JS with run().

meejah commented 9 years ago

There isn't really the equivalent of run in the JS side, because there isn't an event-loop to explicitly start. So, there's still open on both Python and JS, but Python needs to add the concept of run().

Personally, I wouldn't put run on Connection at all -- as that implies that Connection is representing/abstracting two different things: "a WAMP connection", and "how to run a particular event-loop". It's also entirely possible to run several connections in the same Python process with no problem so I would make run a helper-function that can take 1 or more Connection instances as well as logging (and possibly other) options. Like:

from autobahn.twisted import wamp
import example

api = wamp.Connection(example.api.setup)  # on some "public" realm, for example
management = wamp.Connection(example.backend.setup)  # on some private realm, for example

wamp.run([api, management], log_options=dict(level='debug'))

...and a module example/api.py might look like:

from twisted.internet.defer import inlineCallbacks
from autobahn.twisted import wamp

class _HelloWorld(object):
    def __init__(self, db_connection):
        self._db = db_connection

    @wamp.register(u"com.exampe.api.v0.method_one")
    def method_one(self, arg)
        return "Hello, {}".format(arg)

    @wamp.register(u"com.example.api.v0.another_one")
    @inlineCallbacks
    def another_one(self):
        user = yield self._db.query(...)
        other_info = yield self._db.query(...)
        return u"You get the idea."

@inlineCallbacks
def _on_join(db, session):
    api = _HelloWorld(db)
    yield session.register(api, options=RegisterOptions(...))
    yield session.publish(u"com.example.api.v0.ready")

@inlineCallbacks
def setup(connection):
    db = yield connect_to_db(...)
    connection.on('join', _on_join, db)
    connection.on('leave', db.disconnect)

Obviously, you could have several different _HellowWorld-type objects encapsulating different bits of the APIs you want to register, and .register (and .subscribe) them on WAMP connections as appropriate. Notice too that the above class is fairly unit-testable, as you can pass in a fake db object. Also, there are no module-level globals to try and fake out in tests. You can even unit-test the _on_join method fairly easily.

oberstet commented 9 years ago

@sametmax

   from autobahn.twisted.wamp import Connection, CallbackList

    connection, cb = Connection(), CallbackList()

    @cb.register("foo.bar") # not defined in our previous example
    def _():
        return "foobar"

    @cb.subscribe("doh") # not defined in our previous example
    def _():
        print('doh')

    connection.run(cb.run_all)

This code can't run unmodified when taken to CB. And it looks very weird to me, and it does too much magic.

But even without taking decorators into consideration, we need to address below ..

@meejah

There isn't really the equivalent of run in the JS side, because there isn't an event-loop to explicitly start. So, there's still open on both Python and JS, but Python needs to add the concept of run().

Yep.

Personally, I wouldn't put run on Connection at all -- as that implies that Connection is representing/abstracting two different things: "a WAMP connection", and "how to run a particular event-loop".

Good point. It's crap to mix these. Back to drawing table:

from autobahn.twisted.wamp import Connection

def on_join(session):
    print('session connected: {}'.format(session))
    session.leave()

def main(connection):
    connection.on_join(on_join)

if __init__ == __main__:
    from twisted.internet import reactor

    connection = Connection()
    connection.open(main)

    reactor.run()

Connection can take a reactor (loop) argument. One can (well, should in Twisted) run multiple event loops in one process.

Connection.open returns a Deferred (Future) that will fire when the connection is really done (won't automatically reconnect, because explicitly closed or max reconnects reached).

Hiding reactor.run just doesn't make sense ..

@sametmax rgd Connection.open vs .start or .run: whatever we name it, I agree, it should be consistent with AutobahnJS.

oberstet commented 9 years ago

Maybe we can support both composition and inheritance:

class Connection(object):
    def __init__(self, session_class=ApplicationSession, reactor=None):
         pass

    def open(self, main=None):
         pass
meejah commented 9 years ago

I like the concept of "run" as a simple method (no matter if it's actually called run) as to me it's a lot more intuitive that it just never returns. Maybe I just like it because it mimics twisted.internet.task.react...

meejah commented 9 years ago

@oberstet what about adding a "setup=" kwarg to Connection's init -- and it (if available) would be called with the connection object at some point after we've got a session instance and started the reactor, but before we've joined.

I'm guessing that your main= to open is similar, right? I think that still doesn't handle the use-case of "run multiple connections" though, does it?

class FunStuff(object):
    def __init__(self, db_connection):
        self._db = db_connection

    @wamp.register(u"com.example.meaning")
    def meaning(self):
        return 42

@inlineCallbacks
def connection_setup(connection):
    db = yield connect_to_db()
    api_provider = FunStuff(db)

    @inlineCallbacks
    def joined(session):
        yield session.register(api_provider, options=RegisterOptions())
    connection.on('join', joined)

con0 = Connection(setup=connection_setup)
con1 = Connection(...)

run([con0, con1])

Now, if we can get away with having a "joined" and "left" Deferred/Future as attribute-accessible things on ApplicationSession (which we can, I think, with the caveat that they would need to be re-created every time we disconnect) then we could simplify the above a bit:

class FunStuff(object):
    @wamp.register(u"com.example.meaning")
    def meaning(self):
        return 42

@inlineCallbacks
def main(session):
    db = yield connect_to_db()
    api_provider = FunStuff(db)
    # ^ that's pre-session-join()ed setup stuff (no register, publish, etc)
    yield session.joined
    # now we have a connected session, so we can call register etc.
    yield session.register(api_provider, options=RegisterOptions())
    yield session.publish(u"com.example.ready")
    # now cleanup handlers can run after we've left:
    yield session.left
    yield db.disconnect()

con0 = Connection(main=main)
con1 = Connection(...)

run([con0, con1])

That means: main (if provided) is called with the session object as soon as we have a connection (but before we've joined). So it wouldn't get called, for example, if we haven't ever made a valid connection (server offline, etc). It would have to get re-called whenever we re-join(). I'm ignoring error-handling (e.g. db.disconnect() should be in a finally: block).

oberstet commented 9 years ago

I have added a couple of examples of how user code could look like under an API as discussed above:

oberstet commented 9 years ago

Here is the one I am working on master:

from twisted.internet.task import react
from twisted.internet.defer import inlineCallbacks as coroutine
from autobahn.twisted.connection import Connection

def main(reactor, connection):

    @coroutine
    def on_join(session, details):
        print("on_join: {}".format(details))

        def add2(a, b):
            return a + b

        yield session.register(add2, u'com.example.add2')

        try:
            res = yield session.call(u'com.example.add2', 2, 3)
            print("result: {}".format(res))
        except Exception as e:
            print("error: {}".format(e))
        finally:
            session.leave()

    connection.on('join', on_join)

if __name__ == '__main__':

    connection = Connection()
    react(connection.start, [main])
oberstet commented 9 years ago

The fun thing is: you can also use inheritance:

from twisted.internet.task import react
from twisted.internet.defer import inlineCallbacks as coroutine
from autobahn.twisted.wamp import Session
from autobahn.twisted.connection import Connection

class MySession(Session):

    def on_join(self, details):
        print("on_join: {}".format(details))

if __name__ == '__main__':

    connection = Connection()
    connection.session = MySession
    react(connection.start)