Azure / azure-uamqp-c

AMQP library for C
Other
60 stars 63 forks source link

Handling AMQP redirect #231

Closed annatisch closed 6 years ago

annatisch commented 6 years ago

We have a need to support AMQP redirect. It seems a redirect response can occur either at the connection level or the link level, as documented here: http://docs.oasis-open.org/amqp-bindmap/amqp-wsb/v1.0/cs01/amqp-wsb-v1.0-cs01.html#_Toc461801619 http://docs.oasis-open.org/amqp-bindmap/amqp-wsb/v1.0/cs01/amqp-wsb-v1.0-cs01.html#_Toc461801620

I'm not sure what the best approach is here, either for this to be entirely handed at the connection/link layer within uamqp-c, or whether the redirect information should be surfaced so the consuming client can then handle as needed. The latter is probably a quicker option that provides a bit more flexibility.

Any thoughts on this?

I'm happy to look at implementing this, with some advice :)

dcristoloveanu commented 6 years ago

Hi @annatisch

I'll look at this right after fixing the issue of links being inadvertently destroyed when one link is destroyed on the same session.

There were several prior discussions on the need to add redirect so we were fully aware this will pop up at some point. Now seems the time to do it :-).

Thanks! /Dan

annatisch commented 6 years ago

That sounds great - thanks! Let me know if there's something I can do :)

annatisch commented 6 years ago

@dcristoloveanu - just wondering what you think the timeline might look like for this. I'm hoping to put a new release of the Python bindings out beginning of next week (week of the 28th) and would like to hold off until this fix is included if possible. Or do you believe this will likely be quite involved?

Thanks!

dcristoloveanu commented 6 years ago

@annatisch I'll have a look today at how to plug this in and will come back with the timeline once I understood the details.

dcristoloveanu commented 6 years ago

Hi @annatisch

I had a quick look at what I think should be done:

Redirect at connection level: The information with the hostname/port has to be bubbled up to the user code that created the IO used by connection. Connection itself has no knowledge of how to setup the IO and what its parameters are. So the way to go is to indicate to the user the fact that a redirect is needed and a new connection to a different server/port should be made. To achieve this:

The Python SDK would need to shutdown everything upon this event and recreate the IO/connection/session/link/etc. with the new configuration. I assume that would be a matter of running the same code as today, but with a different host/server.

Adding the event mechanism should be doable in a couple of days, I will target early next week, but don't hold me accountable if it is not 100% complete by then :-).

Redirect at link level: Link level redirect seems to follow a similar pattern. It looks like link would require the same mechanism of notifying the user a redirect is needed for that link and then it is in uAMQP user's hands to figure out if a new stack of connection/session/link needs to be spawn for the new server (or maybe it is actually the same connection that is used, but a new link needs creating)

Which of the 2 do you need first? I assume connection, but would be good to know :-).

/Dan

annatisch commented 6 years ago

@dcristoloveanu - awesome, thanks for the investigation! That event mechanism sounds mint. Definitely looking forward to that (both short and long term versions)

Regarding which of the two redirects is a priority is a bit of a gray area that I'm hoping @digimaun can clarify for me. My understanding is that IotHub actually raises a link:redirect, however the contents of the redirect is a new hostname. So by behavior it's actually a connection:redirect, but by implementation it's a link:redirect (as it's not actually received until after a successful connection has been opened).

digimaun commented 6 years ago

Thanks for bringing me in @annatisch.

Yes for certain operations, IoTHub accepts the connection but raises an amqp: link: redirect with payload containing the proper eventhub namespace host (and an amqps endpoint) backing the IoTHub.

Providing a mechanism that allows the uamqp user to catch a redirect event with the critical details would be enough to fulfill my use case. I would then handle it in the application code.

Even better would be uamqp automatically handling the redirect (with some configuration argument enabling/disabling auto redirect). I would be happy either way!

dcristoloveanu commented 6 years ago

@annatisch master now has a way to bubble up the information of a connection close received which contains the redirect information. I was oscillating between this generic connection_close_received handler and a dedicated connection_redirect event handler. I think it would be nice to have both probably and the user can choose which one to use ultimately. We might get there after I have the information also bubbled up for link.

I am now doing the same for link detach received so that the information reaches the uAMQP user and a redirect can then be coded.

Thanks, /Dan

dcristoloveanu commented 6 years ago

@annatisch With the latest changes I believe the redirect information is bubbled up to the user for connection and link. Can you give this a try and let me know if it works for you?

Thanks, /Dan

annatisch commented 6 years ago

Thanks so much @dcristoloveanu! I've started exposing it in the Python bindings and will test it out today :)

annatisch commented 6 years ago

Thanks @dcristoloveanu - so I've been trying to test this but am running into some curious behaviour. I don't know whether it's expected, or whether I have done something incorrectly and I'm hoping @digimaun might be able to clarify.

Effectively, I send an ATTACH frame, then receive one back in response - however it does not have the source/target properties populated. The client detects this as an amqp-decode error and sends an END frame before I can receive the DETACH redirect from the service. Therefore the session has already started shutting down when the DETACH frame is received and the callback is never entered into.

My network trace looks like this:

'-> [ATTACH]* {receiver-link-39ecea96-0fa4-413f-bc26-e62ec3136faa,0,true,0,0,* {amqps://pyot.azure-devices.net/messages/events/},* {12313ac1-0854-4725-ac3d-0d5def1d6510},NULL,NULL,NULL,262144}'
'<- [ATTACH]* {receiver-link-39ecea96-0fa4-413f-bc26-e62ec3136faa,0,false,0,0,NULL,NULL,NULL,NULL,0,262144,NULL,NULL,NULL}'
'-> [END]* {* {amqp:decode-error,Cannot get link source from ATTACH frame}}'
'<- [DETACH]* {0,true,* {amqp:link:redirect,NULL,{[hostname:iothub-ns-pyot-500748-8fa45cb72e.servicebus.windows.net],[network-host:iothub-ns-pyot-500748-8fa45cb72e.servicebus.windows.net],[port:5671],[address:amqps://iothub-ns-pyot-500748-8fa45cb72e.servicebus.windows.net:5671/pyot/]}}}'
'-> [END]* {* {amqp:session:unattached-handle,}}'
'-> [CLOSE]* {}'

So @digimaun - do you know whether the response ATTACH frame is supposed to be missing some parameters and this therefore needs to be handled? Or do you think I've set up a malformed link which is failing for other reasons?

Thanks guys!!

digimaun commented 6 years ago

@annatisch After some investigation I believe I found the problem, you need to append '$management' to the operation path so 'amqps://pyot.azure-devices.net/messages/events/$management'

Edit: Actually nevermind, it looks like appending $management prolongs frames being sent back and forth but the end state is the same where a DETACH containing the redirect data is incoming after END is sent by the client.

digimaun commented 6 years ago

Here is an example of how the event hub node client does it. The pattern for to build the connection string is to hit the /messages/events/$management endpoint, expect an error, parse out the redirect info and return a new connection string.

https://github.com/Azure/azure-event-hubs-node/blob/a0757f04d0a66ec1ae7db7d5f27d70a865458005/client/lib/iothub/iothubClient.ts#L49

annatisch commented 6 years ago

Thanks @digimaun! Though unfortunately it didn't seem to help:

'-> [ATTACH]* {receiver-link-5cef528a-1645-45fa-b476-25e9937a8539,0,true,0,0,* {amqps://pyot.azure-devices.net/messages/events/$management},* {7f20b484-cb97-4380-b694-535c545ced90},NULL,NULL,NULL,262144}'
'<- [ATTACH]* {receiver-link-5cef528a-1645-45fa-b476-25e9937a8539,0,false,0,0,NULL,NULL,NULL,NULL,0,262144,NULL,NULL,NULL}'
'-> [END]* {* {amqp:decode-error,Cannot get link source from ATTACH frame}}'
'<- [DETACH]* {0,true,* {amqp:link:redirect,NULL,{[hostname:iothub-ns-pyot-500748-8fa45cb72e.servicebus.windows.net],[network-host:iothub-ns-pyot-500748-8fa45cb72e.servicebus.windows.net],[port:5671],[address:amqps://iothub-ns-pyot-500748-8fa45cb72e.servicebus.windows.net:5671/pyot/$management]}}}'
'-> [END]* {* {amqp:session:unattached-handle,}}'
'-> [CLOSE]* {}'

Is it possible that the Node.JS SDK doesn't enforce a source/target on the return ATTACH frame?

digimaun commented 6 years ago

Good question. Looks like the Node JS SDK is using rhea for the AMQP implementation. I am not knowledgeable about it, we'll need to investigate how it works.

At least from an error perspective in the Event Hub SDK all the necessary details are in the error that is thrown so the handling seems pretty straightforward .

Transform the error: https://github.com/Azure/azure-event-hubs-node/blob/master/client/lib/errors.ts#L355

Parse redirect infos: https://github.com/Azure/azure-event-hubs-node/blob/master/client/lib/iothub/iothubClient.ts#L91

But yea maybe the configuration or interaction rules are different than uamqp.

dcristoloveanu commented 6 years ago

This might be an issue in uamqp.

Basically the ATTACH with NULL endpoint indicates that the peer(server) does not want to create a terminus.

From the AMQP spec: If there is no pre-existing terminus, and the peer does not wish to create a new one, this is indicated by setting the local terminus (source or target as appropriate) to null. Peer Partner

create link endpoint ATTACH(name=N, handle=1, ----------> create link endpoint (1) role=sender, +--- ATTACH(name=N, handle=2, source=A, / role=receiver, target=B) / source=A, / target=-) (2) <--+ +--- DETACH(handle=2, / closed=True) / / <--+ DETACH(handle=1, -----------> closed=True) ...

(1) The link endpoint is created, but no target is created. (2) At this point the link is established, but it is to a nonexistent target. Figure 2.33: Refusing a Link

Thus, I think there is a bug in uAMQP here where the session should not be ended. Let me have a look and fix this.

Thanks, /Dan

annatisch commented 6 years ago

Thanks @dcristoloveanu I played around with simply ignoring the missing source/target from the link ATTACH frame in the if_attached_type_by_descriptor section in session.c and it seems to run fine for both my existing tests and the IoThub scenario (though I haven't yet run the uAMQP-C tests):

if (attach_get_name(attach_handle, &name) != 0)
    {
        end_session_with_error(session_instance, "amqp:decode-error", "Cannot get link name from ATTACH frame");
    }
    else if (attach_get_role(attach_handle, &role) != 0)
    {
        end_session_with_error(session_instance, "amqp:decode-error", "Cannot get link role from ATTACH frame");
    }
    else 
    {
        if (attach_get_source(attach_handle, &source) != 0)
        {
            source = NULL;
            //end_session_with_error(session_instance, "amqp:decode-error", "Cannot get link source from ATTACH frame");
        }
        else if (attach_get_target(attach_handle, &target) != 0)
        {
            target = NULL;
            //end_session_with_error(session_instance, "amqp:decode-error", "Cannot get link target from ATTACH frame");
        }
        LINK_ENDPOINT_INSTANCE* link_endpoint = find_link_endpoint_by_name(session_instance, name);
        if (link_endpoint == NULL)
        {
            /* new link attach */
            /* .... */
        }
        else
        {
            /* .... */
        }
    }
    attach_destroy(attach_handle);
}
dcristoloveanu commented 6 years ago

@annatisch Yep, that is the basic idea. I need to run through some more cases of what happens to the link and then I'll merge the change in.

Thanks for the patience on this! /Dan

dcristoloveanu commented 6 years ago

@annatisch The fix should be in, I assume it will work for you as it is the same fix you posted in the thread :-)

Thanks! /Dan

annatisch commented 6 years ago

Yes - it's working! Thank you so much :)