django / asgiref

ASGI specification and utilities
https://asgi.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.46k stars 207 forks source link

About `scope[path]` and `scope[root_path]` #424

Open abersheeran opened 9 months ago

abersheeran commented 9 months ago

Related links:

In the first link, @andrewgodwin stated the differences between root_path and path in WSGI. After Starlette actively followed up, this led to the problem in the second link.

Yes, we can make some changes to make it work again, but the question is what is the existing community implementation?

I searched for the code via GitHub and can view the results via the third link. I noticed that in the first page of results, everyone except hypercorn and starlette treat path as the equivalent of PATH_INFO in WSGI. They get the full path by concatenating root_path and path.

So I think we should modify the specification document so that path is exactly the same as PATH_INFO. Otherwise, this will have a strong impact on the existing implementation of the community.

A polite mention of the author of code in the search results, I hope this doesn't bother you. @Kludex @pgjones @kennethreitz @Goldziher @simonw @RobertoPrevato @long2ice If you have time, could you express some thoughts on this?

abersheeran commented 9 months ago

https://github.com/django/asgiref/blob/d95e378e8bf5f0c8c87f4cfd888a0ac26c13fc5a/asgiref/wsgi.py#L60

asgiref itself seems to consider path and PATH_INFO to be completely equivalent. But if it wants to comply with ASGI's current specification, it should be modified as in this PR - remove SCRIPT_NAME from PATH_INFO. https://github.com/abersheeran/a2wsgi/pull/40

https://asgi.readthedocs.io/en/latest/specs/www.html#wsgi-compatibility

Goldziher commented 9 months ago

Agree 💯, they are considered equal

andrewgodwin commented 9 months ago

I put this in the WSGI compat docs very deliberately:

The WSGI adapter in asgiref does not support hosting things under a prefix/non-root-path; that was mostly because I told myself at the time I'd go back and add it later on. As you can see, I still haven't...

abersheeran commented 9 months ago

I put this in the WSGI compat docs very deliberately:

  • SCRIPT_NAME is root_path
  • PATH_INFO can be derived by stripping root_path from path

I mean, it seems that most of the existing ASGI-related procedures do not comply with this standard. Do we still need to insist on this?

andrewgodwin commented 9 months ago

Standards are obtained by consensus, not by dictat, so if there's strong consensus on another way of doing it from all the main servers we can change it.

abersheeran commented 9 months ago

Okay, let's wait for the other people's opinions. As far as the code I looked up, most people still use exactly the same approach as WSGI.

RobertoPrevato commented 9 months ago

Hi @abersheeran no problem with mentioning me, thank you for bringing this up. I can confirm how my web framework is using root_path today: it is used to obtain the full URL of a web request, including any path fragment a proxy server might use, like "/example-api" in the following example:

I started using root_path especially to obtain a redirect URL to the app itself when using OpenID Connect integration. This is mentioned here https://github.com/Neoteroi/BlackSheep/issues/372.

Personally, I find natural to think about the "path" as the path from application's perspective, therefore not including the root path, but I don´t have a problem in updating my code to follow the specification.

tiangolo commented 8 months ago

Hello all!

Just to make sure I'm on the same page, I'm gonna share my understanding of the current state, differences, and the current question.

Mounting

I see that the discussion/difference is in the end mainly about WSGI's PATH_INFO vs ASGI's path. Where PATH_INFO includes only the last part of the path after any prefix and path has everything, including any path prefix.

I see the discussion/problem and questions are only relevant to "mounting" ASGI apps, where "mounting" refers to putting the ASGI app behind a path prefix in a situation where the app's code doesn't know about this prefix.

For example, the app defines a route for /users/ in its code, but it is actually behind some frontend proxy like Nginx, under a path prefix of /api/v1/, so, the client URL path is actually /api/v1/users/ even though the app's code doesn't define anything with /api/v1/.

WSGI

In WSGI, PATH_INFO refers to the last part of the path when an app is "mounted" (behind a path prefix that the app is not aware of, e.g. behind a frontend proxy like Nginx in a path /api/v1/), so, if the app is behind Nginx under /api/v1/, and the client requests /api/v1/users/, the WSGI app would get in PATH_INFO only the last part: /users/.

When the WSGI app wants to check which of its own route/path handlers should handle that request, it checks the value of PATH_INFO and matches it as-is with a route handler for that value, in this case, for /users/. So, when finding a route to handle a request, it doesn't need to use SCRIPT_NAME.

The WSGI app would get the path prefix in SCRIPT_NAME, in this example, the value would be /api/v1/. So, if the app wanted to provide the url_for some specific route that the client should use (including the prefix), it would find the route handler for that path, and then concatenate SCRIPT_NAME and PATH_INFO.

ASGI

In ASGI, path contains the full path from the URL the client used, so, in the example above, it would include the prefix of /api/v1/, and the value would be /api/v1/users/.

The ASGI app would receive the path prefix at path_root, containing the same value that SCRIPT_NAME would have contained, in this example, /api/v1/.

When the ASGI app needs to find the route/path handler for a request, it would remove the prefix value in the root_path from the path, so, removing /api/v1/ from /api/v1/users/, to get /users/ and then find the route/path handler in its code for /users/.

When the ASGI app needs to find the url_for a route/path, it would, again, remove the prefix in root_path to find the route/path handler, and then it would concatenate the root_path with the newly found sub-path from the router.

Relevance

Now, this is mainly relevant for things that are mounted behind a proxy, or behind/mounted in another app.

For example, say there's a Starlette app at /, and it mounts a BlackShip app at /api/v2/. Then a client requests for /api/v2/users/. In this case, Starlette would take this request, and transfer it to BlackSheep.

In the current (unmodified) state of the spec, Starlette should send a scope including a root_path with a value of /api/v2/ and a path with a value of /api/v2/users/. And then BlackSheep would use those two values to find its own route/path handler for the path /users/.

The same for a2wsgi (which it already does), e.g. mounting a Flask app inside a FastAPI app, under /api/v1/.

Moving Forward

Even though it could mean a couple of changes and "work" for all of us, I would consider it better to follow the spec as it is, without changing it, and fix this (small) detail in the tools as a "bug fix".

I fear otherwise it could become a scenario of "in version X of SuperASGITool it used the ASGI spec 2.3 as of 2023, but in version Y of SuperASGITool it started following the updated version of the ASGI spec 2.4 updated in 2024".

Something very similar happened with OpenAPI and JSON Schema OpenAPI followed an old version of JSON Schema that didn't include `examples` and defined its own `examples` field with some structure. Then JSON Schema defined its own `examples` field and OpenAPI was still for a long while using its own `examples` field, so many things would be partially compatible with one or the other. Now the latest OpenAPI follows the latest JSON Schema, and things are much simpler, but still, there are some places that accept the old `examples` field from OpenAPI, with a slightly different format than JSON Schema. So, in a single OpenAPI document, there might be places with a field called `examples` in one format, and in other specific places other `examples` fields with another slightly different format. All that is not terrible, and the difference is very small, but it ends up being somewhat confusing, in particular to users, and requires long text (like this one) explaining all the state of things, what works and what doesn't, when, how, etc. More details about all the `examples` stuff with OpenAPI, JSON Schema, FastAPI, and Pydantic here: https://fastapi.tiangolo.com/tutorial/schema-extra-example/#technical-details ...that's the type of long explanations I would hope we can save our users from. 😅

Starlette was recently using the old behavior, but it was fixed to follow the spec by @Kludex and me in https://github.com/encode/starlette/pull/2352 and https://github.com/encode/starlette/pull/2400.

Again, the places that need changing are only those that might be related to "mounting" apps under a path prefix (when that's supported).

abersheeran commented 8 months ago

Yes, it is good to follow the rules. However, among the existing codes in the community, only a few frameworks or servers comply with this specification. Personally I think it's better to use exactly the same approach as WSGI. (Although I have made a2wsgi compliant spec)

WSGI has been used for more than ten years, and it is really confusing to be similar and different from WSGI in such small details. And this change does not seem to have any good impact. Instead, it makes the framework make more judgments when processing PATH.

We should let as few people as possible modify the code, rather than specify a specification that most people don't follow and then say "Hey, your code is wrong" when their users have problems.

andrewgodwin commented 8 months ago

This was one of the few explicit divergences I made from WSGI - along with things like improving the string/bytes separation - and it was because I believed that having a "true" HTTP path was the most helpful thing to have in the path attribute rather than some subset path that you can't actually build URLs on without knowing the mounting root anyway.

In ASGI, direct path manipulation with relative paths works perfectly fine; in WSGI, it does not, and you can easily shoot yourself in the foot. Ideally, in both specifications you are going to work out the third missing piece of info, but I regarded it as more important to have the path right out of the gate for simpler ASGI applications (assuming frameworks will have the relevant 3 lines of code to do URL matching, as they tend to).

ASGI itself is now seven years old, so an appeal to how old and fixed something is may not be entirely the right approach!

carltongibson commented 5 months ago

FWIW this behaviour was brought into line in the last Daphne and Channels releases too. Django also is compliant.