Neoteroi / BlackSheep

Fast ASGI web framework for Python
https://www.neoteroi.dev/blacksheep/
MIT License
1.86k stars 78 forks source link

routing.Router.sort_routes() does the wrong thing #360

Open remdragon opened 1 year ago

remdragon commented 1 year ago

I spent a lot of time trying to hunt down why a certain route wasn't getting executed. I was surprised to find that blacksheep sorts the rules but then chooses the first matched rule. I found this to be surprising behavior.

Here's the scenario I created which wasn't working (this is working code ported from flask):

@app.route( '/clients/{int:acct}/{path:path}'
async def do_something_client_specific( request, acct, path ):
    ...

@app.route( '/{path:path}' )
async def look_for_site_specific_stuff( path ):
    ...

Here's what I did to fix the problem for me:

from blacksheep.server import routing
class NonSortingRouter( routing.Router ):
    def sort_routes( self ) -> None:
        pass

app = blacksheep.Application( router = NonSortingRouter() )

I'm not sure the rational for auto-sorting the routes, but as I stated before, this was surprising behavior to me. Perhaps this could be opt-in behavior? I can't be the only one porting code from flask. Also the documentation might make more fuss about considerations to watch out for when porting from flask.

RobertoPrevato commented 1 year ago

Hi @remdragon Sorry for replying late in writing, and Thank You for opening this issue. I like your idea to support disabling automatic sorting of routes, and I will add it. However, I won´t make this feature off by default and opt-in, because I still think that most web developers are not as aware as you, that routes should be registered in the proper order.

There is no golden rule here: somebody else will likely get surprised if the framework didn´t sort routes. Consider the following example:


@app.router.get("/api/users/{user_id}")
async def get_user_info(user_id: str):
    """Returns information about a user by id"""

@app.router.get("/api/users/me")
async def get_current_user_info():
    """Returns information about the current user"""

If the framework didn´t sort the routes to evaluate /api/users/me before /api/users/{user_id}, the second request handler would never be used because the first route would catch "me" as "user_id". We can argue that user_id should be checked for me in the business logic part and the same route be used, but not all web developers would catch that.

In summary:

When it comes to write documentation to help people who want to migrate from Flask, I really don't have time for that right now, but I will take it into consideration (I planned to write something about blueprints anyway).

remdragon commented 1 year ago

Your reasoning for the sorting makes sense, I can see myself making the same mistake.

Would you be interested in a donation of a routing engine that doesn't require sorting because it always takes the most specific path? I believe for large systems it will probably be faster than regexing every possible url. I built such a thing for an internal project and would be willing to polish it up and donate it if there's interest.

RobertoPrevato commented 1 year ago

That sounds awesome. I only recommend to wait a bit, because in the last months I considered the introduction of a router protocol, and replacing the current dependency on the exact Router implementation with a light dependency to that protocol. If you look at the Cython code in https://github.com/Neoteroi/BlackSheep/blob/main/blacksheep/baseapp.pyx

you can see that the router is only used for this single method, when the application is handling web requests (i.e. after routes have been defined):

def get_match(request: Request) -> Response:
    ...

Therefore I am planning to do something like:

class RouterProtocol(Protocol):
    def get_match(request: Request) -> Response:
        ...

...

class Application:
    router: RouterProtocol

This would enable easily alternative router implementations, like the one you are describing. In that scenario, it might also be placed in a dedicated package (just thinking 💭).

The only reason why I didn't add the protocol, yet, is that I am still considering whether to change the code API to register routes. From one side I would like to keep the current code API to register request handlers, but on the other side I would like to keep the protocol as simple as possible, with a single method (and not those to register handlers get, post, put, etc.).