crossbario / autobahn-python

WebSocket and WAMP in Python for Twisted and asyncio
https://crossbar.io/autobahn
MIT License
2.48k stars 770 forks source link

Allow loose/strict URI checking levels also for decorators and component API #939

Open gruns opened 6 years ago

gruns commented 6 years ago

Capital letters are not allowed in registered URIs. For example, @register('example.camelCaseFails') fails:

#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-

from autobahn.wamp import register
from autobahn.twisted.wamp import ApplicationRunner, ApplicationSession

class Example(ApplicationSession):
    @register('example.snake_case_succeeds')
    def foo(self):
        pass

    @register('example.camelCaseFails')
    def blah(self):
        pass

ApplicationRunner(None, None).run(WampServices)

Raises

Traceback (most recent call last):
  File "./autobahn-register-camel-case.py", line 7, in <module>
    class Example(ApplicationSession):
  File "./autobahn-register-camel-case.py", line 12, in Example
    @register('warden.camelCaseFails')
  File "/usr/local/lib/python3.6/dist-packages/autobahn/wamp/uri.py", line 343, in decorate
    f._wampuris.append(Pattern(real_uri, Pattern.URI_TARGET_ENDPOINT, options))
  File "/usr/local/lib/python3.6/dist-packages/autobahn/wamp/uri.py", line 215, in __init__
    raise Exception("invalid URI")
Exception: invalid URI

Googling, the only related discussion I found was

https://groups.google.com/forum/#!topic/autobahnws/mkjF21Fb8ow

Why, by default, are capital letters (e.g. for camelCase) not allowed in registered URIs?

oberstet commented 6 years ago

Why, by default, are capital letters (e.g. for camelCase) not allowed in registered URIs?

because AutobahnPython follows the WAMP spec and enforces strict URIs per default: http://wamp-proto.org/static/rfc/draft-oberstet-hybi-crossbar-wamp.html#rfc.section.5.1.1.2

I seem to remember there are some knobs to change the default behavior, allowing "loose URIs" (see above)

gruns commented 6 years ago

Thanks for chiming in.

While the above rules MUST be followed, following a stricter URI
rule is recommended: URI components SHOULD only contain
lower-case letters, digits and _.

Why does WAMP not permit capital letters in URIs, though? Every major programming language permits capital letters in identifiers, including those of the various WAMP implementations http://wamp-proto.org/implementations/. Why don't WAMP URIs?

The forced change from CamelCaseFunctions to camel_case_functions is confusing and unintuitive, especially for RPC. Doubly so when the common code formatting for languages with WAMP implementations is camelCase (Javascript, Java, etc).

oberstet commented 6 years ago

Why does WAMP not permit capital letters in URIs, though?

whitespace, funny characters and capital letters don't add any gain, but only hassles.

gruns commented 6 years ago

whitespace, funny characters and capital letters don't add any gain, but only hassles.

Whitespace and funny characters: no. Whitespace and funny characters aren't valid identifier characters in mainstream programming languages (Python, Javascript, Go, etc) and thus I agree and see little reason to allow them in WAMP URIs.

But capital letters? Not only are capital letters valid identifier characters in every single WAMP implementation language enumerated here:

But hell: in Erlang, identifiers have to start with a capital letter.

http://erlang.org/doc/reference_manual/expressions.html#id80984

Thus, support for capital letters engenders an immediate and tangible gain: WAMP URIs, e.g. exported RPC functions, no longer have to needlessly and confusingly mangle camelCase identifiers and function names.

That's exactly the problem I ran headlong into. I have to confusingly register this function

async def constructClientPayloadOnConnect(self, conn):
   pass

under a different, snake_case name.

@register('construct_client_payload_on_connect')
async def constructClientPayloadOnConnect(self, conn):
   pass

I had one function name. Now I have two. Because of this restriction, the number of function names every developers has to remember and reason about doubles. And I'm hardly the first one to be bitten by this unintuitive restriction:

https://groups.google.com/forum/#!topic/autobahnws/mkjF21Fb8ow

So there's an immediately gain for the allowance of capital letters: camelCase identifiers and functions aren't mangled.

WAMP should encourage consistent code. Unfortunately this restriction does the opposite.

Let's fix it!

gruns commented 6 years ago

Little bump: poor, innocent camelCase functions still suffer needlessly at the hands of this restriction.

Let's fix it!

If you'll merge it, I'm happy to file a pull request that does just that.

oberstet commented 6 years ago

Ok, I agree: this issue has been raised more than once - I am reopening this:

Autobahn does have 2 levels of URI checking (internally), but this should be:

The 2 levels of URI checking are "strict" vs "loose" and the respective regular expressions are here: https://github.com/crossbario/autobahn-python/blob/master/autobahn/wamp/message.py#L72

# strict URI check allowing empty URI components
_URI_PAT_STRICT_EMPTY = re.compile(r"^(([0-9a-z_]+\.)|\.)*([0-9a-z_]+)?$")

# loose URI check allowing empty URI components
_URI_PAT_LOOSE_EMPTY = re.compile(r"^(([^\s\.#]+\.)|\.)*([^\s\.#]+)?$")

# strict URI check disallowing empty URI components
_URI_PAT_STRICT_NON_EMPTY = re.compile(r"^([0-9a-z_]+\.)*([0-9a-z_]+)$")

# loose URI check disallowing empty URI components
_URI_PAT_LOOSE_NON_EMPTY = re.compile(r"^([^\s\.#]+\.)*([^\s\.#]+)$")

# strict URI check disallowing empty URI components in all but the last component
_URI_PAT_STRICT_LAST_EMPTY = re.compile(r"^([0-9a-z_]+\.)*([0-9a-z_]*)$")

# loose URI check disallowing empty URI components in all but the last component
_URI_PAT_LOOSE_LAST_EMPTY = re.compile(r"^([^\s\.#]+\.)*([^\s\.#]*)$")

There are 6 regular expression, because there are 2 levels (strict vs loose), and 3 types: plain/exact URI, wildcard URI pattern and prefix URI pattern.

oberstet commented 6 years ago

the other option would be to redefine the "strict" pattern:

I think this makes sense anyways, as we do allow a leading digit, which is not a valid identifier in most programming languages.

so above would allow for:

but it still doesn't allow for eg base64:

source: https://stackoverflow.com/questions/475074/regex-to-parse-or-validate-base64-data


actually, I think a better user experience could be:

gruns commented 6 years ago

actually, I think a better user experience could be:

  • by default, use the "loose" patterns, which only exclude whitespace, # and . (URI parts can be anything matching [^\s.#]*)

  • let the use switch to "strict", and have that use a pattern that results in valid identifiers for most programming languages

Sounds great.

This solves the original problem (support for CamelCase) and opens the door for other URIs schemes, too (e.g. base64). For example, if a user wants to use emojis in their URIs, I see no compelling reason to stop them.

How can I help implement this?

guanzo commented 5 years ago

+1. Having camelCased uri's would be nice.