fermyon / spin-python-sdk

Spin SDK for Python
https://developer.fermyon.com/spin/v2/python-components
Apache License 2.0
29 stars 11 forks source link

https://self not working #107

Closed thesuhas closed 3 months ago

thesuhas commented 4 months ago

Hi, I have a function in a router that receives a request and sends a request to another service.

@router.route("/users/{username}/bookings")
def user_booking(uri: ParseResult, request: Request, params):
    global users
    if users == None:
        open_file()
    username = params['username']
    if username not in users:
        raise Exception("Not found")
    resp = send(Request("GET", f"/bookings/{username}", {}, None))
    print(f"Response: {resp}")
    return Response(200, {}, resp.body)

I get the following error: image It seems like set_authority is throwing an error. On looking at the source code of that function, it has not been implemented.

def set_authority(self, authority: Optional[str]) -> None:
        """
        Set the HTTP Authority for the Request. A value of `none` may be used
        with Related Schemes which do not require an Authority. The HTTP and
        HTTPS schemes always require an authority. Fails if the string given is
        not a syntactically valid uri authority.

        Raises: `spin_sdk.wit.types.Err(None)`
        """
        raise NotImplementedError

Another thing is the error says line 883 and this is not at line 883. at line 883, the function is:

def set_between_bytes_timeout(self, duration: Optional[int]) -> None:
        """
        Set the timeout for receiving subsequent chunks of bytes in the Response
        body stream. An error return value indicates that this timeout is not
        supported.

        Raises: `spin_sdk.wit.types.Err(None)`
        """
        raise NotImplementedError

On adding a few logs into spin-python-sdk, I see the following differences between this example and my code: Example:

request Request(method='GET', uri='https://example.com', headers={}, body=None)
Parsed Url: ParseResult(scheme='https', netloc='example.com', path='', params='', query='', fragment='')

Output of my component:

request Request(method='GET', uri='/bookings/dwight_schrute', headers={}, body=None)
Parsed Url: ParseResult(scheme='', netloc='', path='/bookings/dwight_schrute', params='', query='', fragment='')

I see that both the scheme and netloc fields are empty. On changing the outbound hosts to:

allowed_outbound_hosts = ["http://*:*", "https://*:*"]

from:

allowed_outbound_hosts = ["http://self", "https://self"]

and changing the url to point to http://127.0.0.1:3000/users/{username}/bookings (the default url that spin exposes the application to), it seems to be working. On changing the url to http://self/users/{username}/bookings it throws a DNS not found error, as https://self is not a website that exists.

dicej commented 4 months ago

Thanks for reporting this, @thesuhas.

The spin_sdk.wit code in the spin_sdk package is not actually used at runtime -- that code is just stub bindings generated for the purposes of documentation and IDE integration. The actual code is generated by componentize-py at build time such that each imported function calls into the host instead of raising a NotImplementedError. That also affects the line numbers, since the function bodies are different. Sorry for the confusion.

In this case, I'm not surprised the /bookings/dwight_schrute path doesn't work, since wasi:http needs a full URL including a protocol and an authority. Have you tried following https://developer.fermyon.com/spin/v2/http-outbound#making-http-requests-within-an-application ? I think that's the official way to do this, but @itowlson could probably correct me if I'm mistaken.

dicej commented 4 months ago

Oh, but I just noticed that http://self should work according to https://developer.fermyon.com/spin/v2/http-outbound#intra-application-http-requests-by-route. Hopefully @itowlson can tell us what we're missing here.

thesuhas commented 4 months ago

I think I have figured out what the issue is. In the rust sdk:

impl OutboundHttp {
    /// Check if guest module is allowed to send request to URL, based on the list of
    /// allowed hosts defined by the runtime. If the url passed in is a relative path,
    /// only allow if allowed_hosts contains `self`. If the list of allowed hosts contains
    /// `insecure:allow-all`, then all hosts are allowed.
    /// If `None` is passed, the guest module is not allowed to send the request.
    fn is_allowed(&mut self, url: &str) -> Result<bool, HttpError> {
        if url.starts_with('/') {
            return Ok(self.allowed_hosts.allows_relative_url(&["http", "https"]));
        }

        Ok(OutboundUrl::parse(url, "https")
            .map(|u| self.allowed_hosts.allows(&u))
            .unwrap_or_default())
    }
}

That check is not done here.

I believe it can be an easy fix. Something like:

    url_parsed = parse.urlparse(request.uri)
    if url_parsed.netloc == '' and url_parsed.path.startswith('/'):
        # If not allowed, throw error
        # Else set scheme to http/https and netloc to 127.0.0.1:port

I am unable to see where in the python sdk you have the list of allowed hosts to implement this. I've changed outbound hosts to:

allowed_outbound_hosts = ["http://self", "https://self"]

and tried sending a request to http://self/users/{username}/bookings. The backtrace is:

image

From what I gather, this exception is not thrown in any python code.

itowlson commented 4 months ago

@dicej Paths are meant to work. The Spin runtime translates them into self-requests (provided self is an allowed host) by prepending the host and protocol of the original inbound request.

@thesuhas I don't think http://self works as part of the URL - it is a magic identifier for allowed_outbound_hosts. As you note, https://developer.fermyon.com/spin/v2/http-outbound#intra-application-http-requests-by-route describes this. The expected pseudo-URL is /api/customers.

However, as Joel notes, if your target environment supports it, it's more efficient to use local service chaining (https://developer.fermyon.com/spin/v2/http-outbound#local-service-chaining), which does not travel over the network.

@thesuhas regarding where the error is created, it is in the Spin runtime - the OutboundHttp::is_allowed fragment you quote is part of the Spin host, not part of the Rust SDK.

thesuhas commented 4 months ago

Thanks @itowlson ! But, having allowed_outbound_hosts = ["http://self", "https://self"] and just using a url such as /api/customers seems to be throwing an error for now. If I would not want to switch to local service chaining, how could I fix this?

itowlson commented 4 months ago

I am not sure @thesuhas. I believe it ought to work, and unless I am missing something it's a bug that it doesn't. I will have a look but I don't know much about the Python side.

dicej commented 4 months ago

Sounds like we need to add the equivalent relative URL handling @thesuhas found in the Rust SDK to the Python SDK, if I'm understanding correctly.

dicej commented 4 months ago

Actually, where did you find that code in the Rust SDK, @thesuhas ? I'm not seeing it.

itowlson commented 4 months ago

It's not in the Rust SDK though @dicej. That's the Spin Outbound HTTP host component.

thesuhas commented 4 months ago

If I would want to add the equivalent relative URL handling and make a contribution, how would I gain access to both the allowed outbound hosts , the url and the port that spin is deploying the application on? If one could access them, this check could be added into the python sdk itself.

thesuhas commented 4 months ago

Actually, where did you find that code in the Rust SDK, @thesuhas ? I'm not seeing it.

I found it here: https://github.com/fermyon/spin/blob/9f1d9dbc43ae0bde8ffe3729ac9b545b7d571e15/crates/outbound-http/src/host_impl.rs#L23

dicej commented 4 months ago

Oh, actually I think this needs to be ported to the Python SDK: https://github.com/fermyon/spin-rust-sdk/blob/184946fdd3cb48de3c3af60f8e053b42947c791b/src/http/conversions.rs#L557-L560

thesuhas commented 4 months ago

It's not in the Rust SDK though @dicej. That's the Spin Outbound HTTP host component.

Does this mean that this is where the check happens even while using the python sdk? And does this mean spin composes an Outbound HTTP component to the spin component it builds and the flow of the request is Spin Component Created by User -> Outbound HTTP Component -> Request is sent?

dicej commented 4 months ago

The problem is that with the current Python SDK, the relative URL requests aren't even making it to the host, since set_authority is choking. We need to port the above code I linked to from spin-rust-sdk to fix that.

dicej commented 4 months ago

See also https://github.com/fermyon/spin-rust-sdk/commit/9f59a5ea6d4575618bc4bbc63b6825ec63c2c341

dicej commented 4 months ago

And https://github.com/fermyon/spin-rust-sdk/issues/8

itowlson commented 4 months ago

@thesuhas It's not quite composition, but conceptually similar. The HTTP service is provided by the Spin host, and the Spin host understands the special self-request syntax and service chaining URLs. So when you make a HTTP request through any SDK, that gets passed through to the Spin host which, after validating, performs the actual request.

The outbouind hosts validation etc. has to be in the runtime rather than in the SDK, because the SDK is strictly optional. You can write a Spin component using raw component tooling and the Spin WITs. We cannot depend on the SDK for security checking.

thesuhas commented 4 months ago

The problem is that with the current Python SDK, the relative URL requests aren't even making it to the host, since set_authority is choking. We need to port the above code I linked to from spin-rust-sdk to fix that.

How would one port that? Currently most of the code in wit/types.py is placeholders for componentize-py right? Considering https://github.com/fermyon/spin-rust-sdk/commit/9f59a5ea6d4575618bc4bbc63b6825ec63c2c341, I would assume you would check the authority in the python sdk and if not, prepend the URL?

thesuhas commented 4 months ago

@thesuhas It's not quite composition, but conceptually similar. The HTTP service is provided by the Spin host, and the Spin host understands the special self-request syntax and service chaining URLs. So when you make a HTTP request through any SDK, that gets passed through to the Spin host which, after validating, performs the actual request.

The outbouind hosts validation etc. has to be in the runtime rather than in the SDK, because the SDK is strictly optional. You can write a Spin component using raw component tooling and the Spin WITs. We cannot depend on the SDK for security checking.

So this means that the only fix in this case would be setting the authority correctly as then the spin host would have all the information it needs to validate and perform the request?

And in this case, authority would be the domain name and port right? And domain name would be the URL at which the spin application is running (localhost if running locally) and port would just be the port that the spin application is listening at?

itowlson commented 4 months ago

@thesuhas Looking at the Rust SDK code I think the idea is not to prepend a full URL but to inject a dummy authority consisting only of the port.

dicej commented 4 months ago

Can you try this, please?

diff --git a/src/spin_sdk/http/__init__.py b/src/spin_sdk/http/__init__.py
index dd98107..094d94a 100644
--- a/src/spin_sdk/http/__init__.py
+++ b/src/spin_sdk/http/__init__.py
@@ -174,7 +174,14 @@ async def send_async(request: Request) -> Response:
     outgoing_request = OutgoingRequest(Fields.from_list(headers))
     outgoing_request.set_method(method)
     outgoing_request.set_scheme(scheme)
-    outgoing_request.set_authority(url_parsed.netloc)
+    if url_parsed.netloc is None:
+        if scheme == "http":
+            authority = ":80"
+        else:
+            authority = ":443"
+    else:
+        authority = url_parsed.netloc
+    outgoing_request.set_authority(authority)
     path_and_query = url_parsed.path
     if url_parsed.query:
         path_and_query += '?' + url_parsed.query
dicej commented 4 months ago

BTW, if that doesn't work we might need to check if url_parsed.netloc is non-None and non-empty. And I guess I should probably just test it myself :)

thesuhas commented 4 months ago

Can you try this, please?

diff --git a/src/spin_sdk/http/__init__.py b/src/spin_sdk/http/__init__.py
index dd98107..094d94a 100644
--- a/src/spin_sdk/http/__init__.py
+++ b/src/spin_sdk/http/__init__.py
@@ -174,7 +174,14 @@ async def send_async(request: Request) -> Response:
     outgoing_request = OutgoingRequest(Fields.from_list(headers))
     outgoing_request.set_method(method)
     outgoing_request.set_scheme(scheme)
-    outgoing_request.set_authority(url_parsed.netloc)
+    if url_parsed.netloc is None:
+        if scheme == "http":
+            authority = ":80"
+        else:
+            authority = ":443"
+    else:
+        authority = url_parsed.netloc
+    outgoing_request.set_authority(authority)
     path_and_query = url_parsed.path
     if url_parsed.query:
         path_and_query += '?' + url_parsed.query

That did not work. I got the following error:

image

Authority is still nothing.

On changing it to:

if url_parsed.netloc == '':
        if scheme == "http":
            authority = ":80"
        else:
            authority = ":443"
else:
       authority = url_parsed.netloc

I get:

image

The authority is now being set but a HttpProtocolError

dicej commented 4 months ago

Ok, thanks for trying. I'll start debugging this on my side.

dicej commented 4 months ago

Oh, I wonder if scheme is empty or None also. I'll check it out.

thesuhas commented 4 months ago

Oh, I wonder if scheme is empty or None also. I'll check it out.

scheme is empty as well. url_parsed is : ParseResult(scheme='', netloc='', path='/bookings/dwight_schrute', params='', query='', fragment='')

dicej commented 4 months ago

Yeah, so we'll need to force scheme to "http" in that case.

thesuhas commented 4 months ago
    match url_parsed.scheme:
        case "http":
            scheme: Scheme = Scheme_Http()
        case "https":
            scheme = Scheme_Https()
        case "":
            scheme = Scheme_Http()
        case _:
            scheme = Scheme_Other(url_parsed.scheme)

    if request.headers.get('content-length') is None:
        content_length = len(request.body) if request.body is not None else 0
        request.headers['content-length'] = str(content_length)

    headers = list(map(
        lambda pair: (pair[0], bytes(pair[1], "utf-8")),
        request.headers.items()
    ))

    outgoing_request = OutgoingRequest(Fields.from_list(headers))
    outgoing_request.set_method(method)
    outgoing_request.set_scheme(scheme)
    if url_parsed.netloc == '':
        if scheme == "http":
            authority = ":80"
        else:
            authority = ":443"
    else:
        authority = url_parsed.netloc

    outgoing_request.set_authority(authority)

This works!

thesuhas commented 4 months ago

I have tried to create a new branch to make a PR but it seems like I do not have permissions to create a new branch. Can I make this contribution? How could I go about doing that?

dicej commented 4 months ago

Sorry, I was in a meeting. You should be able to fork the repo, push your change to a branch of that fork, and then PR to the main repo from there. If that's a pain, I can do it it myself.

thesuhas commented 4 months ago

Will do, thanks!

thesuhas commented 4 months ago

Raised https://github.com/fermyon/spin-python-sdk/pull/108