Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.47k stars 4.8k forks source link

[QUERY] ServiceBus session ID in receiver's `Attach` frame is not compliant to AMQP 1.0 protocol #32243

Closed minghuaw closed 1 year ago

minghuaw commented 1 year ago

Library name and version

Azure.Messaging.ServiceBus - 7.12.0-beta.1

Query/Question

I am in the middle of implementing a rust sdk for ServiceBus, and I am not sure if I got the correct understanding on how the ServiceBus session ID is encoded in AMQP. I might have got it wrong, so please correct me if I am wrong. The main problem is that I think the current implementation doesn't comply with AMQP 1.0 protocol.

Currently, the way it is implemented in Azure.Messaging.ServiceBus is like following (Line 681)

// even if supplied sessionId is null, we need to add the Session filter if it is a session receiver
if (isSessionReceiver)
{
    filters.Add(AmqpClientConstants.SessionFilterName, sessionId);
}

var linkSettings = new AmqpLinkSettings
{
    Source = new Source { Address = endpoint.AbsolutePath, FilterSet = filters }, // Line 690
    // ... other setting
};

where sessionId is just a string. The sessionId is added as one entry to the filter-set field of the receiver link's Source

However, regarding the filter-set type, the AMQP 1.0 core spec states that (http://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-messaging-v1.0-os.html#type-filter-set)

A set of named filters. Every key in the map MUST be of type symbol, every value MUST be either null or of a described type which provides the archetype filter.

and string is not a described type. Is the filter-set serialized in a special way such that it would turn this string into a described type? If so, could you please point me to the code or spec of the described type (more specifically, this descriptor for this filter)?

Environment

No response

jsquire commented 1 year ago

Hi @minghuaw. The short version is that FilterSet is an intermediate format handles the encoding.

The Service Bus library makes use of an external library, Microsoft.Azure.Amqp, for its transport layer which is where FilterSet resides. Internally, FilterSet uses an AmqpMap to store the filter data, which is encoded by AmqpCodec at some point in the process.

If you'd like to dig deeper into the details or discuss further, I'd suggest opening an issue in the Azure/azure-amqp repository. The folks there would be best able to speak to the details, point you at the right place in the source, and discuss why certain decisions were made for the implementation.

I'm going to mark this as addressed, but please feel free to unresolve if you'd like to discuss the specifics of how the Service Bus clients interact with the AMQP transport types.

ghost commented 1 year ago

Hi @minghuaw. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

minghuaw commented 1 year ago

Hi @jsquire. Thank you for your quick response. I don't think this is fully addressed though.

I think the problem is not single sided (ie. partly due to the AMQP implementation and partly due to how the AMQP library is used). I am fairly confident with my grasp on the protocol since I have implemented the full protocol in rust. So, to boil down the problem:

  1. On the AMQP library side, I have indeed looked at Azure/azure-amqp before opening this issue, and part of the reason why I opened this was because I didn't see any special treatment for the filter-set map. As you pointed out, it uses AmqpMap which simply uses the object type as the value type for the map, just like most other AMQP implementations with an OOP language. The use of object doesn't strictly enforce the type requirement of "described type" on the type system level. And thus it allows its users to use the library sometimes in a non-compliant manner.
  2. On the AMQP library user side, sessionId is not any special type that implements custom serialization and is simply of type string, which most likely will be serialized the same way as any other regular string (I didn't see special treatment in Azure/azure-amqp codec, but I will for sure check again). Plus, the object type parameter creates an "illusion" that the compiler doesn't complain and this usage is fine.

Despite my reasoning above, I have to admit that I haven't done enough investigation myself. More specifically I haven't tried to look at the actual encoded bytes to verify this, which I will do when I get time. And I will probably open an issue on Azure/azure-amqp on the use of object in the map's value type parameter, however I doubt whether that will be addressed or not since that is pretty much the way every other OOP language implementation takes.

jsquire commented 1 year ago

Unfortunately, I can't fully address your inquiry, as I don't have the insight that you're after. In a nutshell, I don't know specifically how the map is translated without digging into the AMQP library source which I have not had a need to do.

The Microsoft.Azure.Amqp library is owned by the Azure Messaging team - who own Service Bus and Event Hubs. The client patterns using that library are based on guidance from them and correlate to the service implementation. To my knowledge, the service and client communication is conforming to the AMQP spec, with the codec that I linked handling translation. The folks that maintain that library would be able to provide guidance.

minghuaw commented 1 year ago

Thanks again. Like I said, I will need to do some more investigations as well. I will close this issue for now then.

minghuaw commented 1 year ago

@jsquire I have created an minimal example with Azure/azure-amqp, where the receiver link is created similarly withthe filter-set field set to contain an entry of a string

// Only the link creation part is copied over
AmqpLinkSettings settings = new AmqpLinkSettings();
FilterSet filterSet = new FilterSet();
filterSet.Add("com.microsoft:session-filter", "test-filter");
settings.LinkName = $"receiver-{DateTime.UtcNow.Ticks}";
settings.Role = true;
settings.Source = new Source() { Address = "q1", FilterSet = filterSet };
settings.Target = new Target();
settings.TotalLinkCredit = 300;
ReceivingAmqpLink receiver = new ReceivingAmqpLink(session, settings);
await receiver.OpenAsync(TimeSpan.FromSeconds(10));

And I captured the bytes with wireshark and found out that it is NOT compliant to the AMQP protocol.

I am going to open an issue on Azure/azure-amqp, but for the sake of completeness, I will paste the captured bytes and explain why it is not compliant below

The captured bytes for the receiver link's Attach frame are:

0040   00 00 00 84 02 00 00 00 00 53 12 c0 77 0e a1 1b
0050   72 65 63 65 69 76 65 72 2d 36 33 38 30 33 30 36
0060   32 38 38 30 39 34 32 38 32 36 39 43 41 40 40 00
0070   53 28 c0 3c 0b a1 02 71 31 40 40 40 40 40 40 c1
0080   2c 02 a3 1c 63 6f 6d 2e 6d 69 63 72 6f 73 6f 66
0090   74 3a 73 65 73 73 69 6f 6e 2d 66 69 6c 74 65 72
00a0   a1 0b 74 65 73 74 2d 66 69 6c 74 65 72 40 40 40
00b0   00 53 29 c0 08 07 40 40 40 40 40 40 40 40 40 40
00c0   40 40 40 40

And the part that we are interested in is the FilterSet part of the Source field, which contains the following bytes

0070                                                c1
0080   2c 02 a3 1c 63 6f 6d 2e 6d 69 63 72 6f 73 6f 66
0090   74 3a 73 65 73 73 69 6f 6e 2d 66 69 6c 74 65 72
00a0   a1 0b 74 65 73 74 2d 66 69 6c 74 65 72

Breaking down the captured bytes, the FilterSet field is a map as indicated by the format code c1 at the beginning. And it contains only one key and one value.

A set of named filters. Every key in the map MUST be of type symbol, every value MUST be either null or of a described type which provides the archetype filter.

And on the wire, a described type should start with 0x00 and look like the Figure 1.2 (http://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-types-v1.0-os.html#doc-idp38080), which is pasted below. The value bytes pasted above clearly doesn't look anything close to a "described type"


            constructor                       untyped bytes
                 |                                 |
     +-----------+-----------+   +-----------------+-----------------+
     |                       |   |                                   |
...  0x00 0xA1 0x03 "URL" 0xA1   0x1E "http://example.org/hello-world"  ...
          |             |  |     |                                   |
          +------+------+  |     |                                   |
                 |         |     |                                   |
            descriptor     |     +------------------+----------------+
                           |                        |
                           |         string value encoded according
                           |           to the str8-utf8 encoding
                           |
                  primitive format code
                for the str8-utf8 encoding

     (Note: this example shows a string-typed descriptor, which is
      considered reserved)
jsquire commented 1 year ago

Thank you for your follow-up, @minghuaw. I've moved discussion over to the issue that you've opened in the azure-amqp repository, as we'll need perspectives from the transport/service owners. I'll leave this open for now so that if we need to adjust client patterns, we can use it for tracking.

minghuaw commented 1 year ago

This query has been answered in https://github.com/Azure/azure-amqp/issues/231 and thus will be closed.

jsquire commented 1 year ago

I'm going to reopen this for tracking. The clarifications in the Azure/azure-amqp thread indicate that the service supports both forms for legacy compatibility. The client should be complying with the AMQP spec, since the service supports that form.

minghuaw commented 1 year ago

@jsquire FYI, it seems like only the descriptor name works for now. An error is returned if the descriptor code is sent

minghuaw commented 1 year ago

@jsquire FYI, it seems like only the descriptor name works for now. An error is returned if the descriptor code is sent

NVM, the descriptor code provided in https://github.com/Azure/azure-amqp/issues/231

0x00000137:000000C

is somewhat correct (it's actually 0x000_0013_7000_000C). However, this is not strictly compliant to the protocol, according to the protocol (http://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-types-v1.0-os.html#section-descriptor-values),

numeric descriptors: (domain-id << 32) | descriptor-id

A strictly-compliant descriptor code should be 0x0000_0137 << 32 | 0x000_000c, but the actual descriptor code is 0x0000_0013 << 32 | 0x7000_000c.

jsquire commented 1 year ago

I've been doing some prototyping and have been unable to get either descriptor code working with our test bed. I'm seeing the same rejection for the strictly compliant descriptor and while the actual descriptor listed above is accepted, I'm seeing it consistently fail to retrieve the session when opening the link.

Given that the currently accepted descriptor is still not fully spec-compliant, I don't think that using it would really put the client library in a better position than it is today - so I don't believe it's worth investing additional time to dig into why we're seeing failures. I'm going to close this out for now.