Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.57k stars 4.83k forks source link

[BUG] Batched EventHub incorrectly wrapping input data #45337

Closed hallvictoria closed 3 months ago

hallvictoria commented 4 months ago

Library name and version

Azure.Messaging.EventHubs

Describe the bug

Triggering an event hub triggered function with cardinality MANY passes a single EventHubEvent to the function but wraps the event body in a list.

Continuation from https://github.com/Azure/azure-functions-python-worker/issues/1524 The data received in the worker is already wrapped in the list. This is not additional processing on the worker side.

Expected behavior

Input data: {'input': [{'some': 'awesome', 'event': 'body'}, {'some': 'other', 'event': 'body'}]}

Expected return value: [{'some': 'awesome', 'event': 'body'}, {'some': 'other', 'event': 'body'}]

Actual behavior

Actual return value: [[{"some": "awesome", "event": "body"}, {"some": "other", "event": "body"}]]

The return value is wrapped in a list.

Reproduction Steps

  1. Create an EventHub trigger function in python (V1 or V2 programming model)
  2. Set cardinality to 'many'
  3. Send sample input to EventHub trigger: [{"some": "awesome", "event": "body"}, {"some": "other", "event": "body"}]

Sample code:

import azure.functions as func
import logging

app = func.FunctionApp()

@app.event_hub_message_trigger(arg_name="azeventhub",
                               event_hub_name="<event_hub_name>",
                               cardinality='many',
                               connection="<connection_string>") 
def eventhub_trigger(azeventhub: func.EventHubEvent):
    logging.info('Python EventHub trigger processed an event: %s',
                azeventhub.get_body().decode('utf-8'))

Environment

No response

github-actions[bot] commented 4 months ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

jsquire commented 4 months ago

@hallvictoria: Neither the Event Hubs client nor service inspect or alter the body of an event in any way. It is published in byte form and treated as an opaque blob of bytes. The functions extension package does potentially deserialize based on the bindings used, but also does not interpret or alter the data.

In order to perform any analysis, we'll need to ask that you help us understand how an event gets passed from the trigger to the Python worker - particularly in what form that data comes out. Once we understand that, we'll have some insight into what binding/deserialization path it takes.

github-actions[bot] commented 4 months ago

Hi @hallvictoria. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

hallvictoria commented 4 months ago

Hi @jsquire, thanks for taking a look into this.

As per my comment on the linked issue, the user was reporting a situation where the output of their function app was unexpected.

This is their current flow:

  1. User triggers the EventHub app with the following input data: [{"some": "awesome", "event": "body"}, {"some": "other", "event": "body"}]
  2. The worker receives the data in this form: [[{"some": "awesome", "event": "body"}, {"some": "other", "event": "body"}]]
  3. The worker creates and returns a single EventHubEvent object to the app

This is the flow needed to fit the user's expectations:

  1. User triggers the EventHub app with the following input data: [{"some": "awesome", "event": "body"}, {"some": "other", "event": "body"}]
  2. The worker receives the data in this form: [{"some": "awesome", "event": "body"}], [{"some": "other", "event": "body"}]
  3. The worker creates and returns a list of EventHubEvent objects

The change needed to accommodate the user's expectations would be solely in the data that the worker receives. If the data isn't wrapped in a list and is instead a list of dicts, the worker will return a list of EventHubEvent objects.

If this isn't the correct spot to file an issue or if this is expected behavior, please let me know.

jsquire commented 4 months ago

@hallvictoria : I appreciate your response but, unfortunately, that does not address the questions that I'm asking. In order to assist, I need to understand how the data that is surfaced by the Event Hubs extension package via the trigger is consumed by the infrastructure that passes it to the Python worker. That is not something that the Event Hubs extension has insight into nor influence over.

Understanding how the data is passed and consumed would allow me to look at the bindings for that specific format and try to repro. Because there are several layers to the stack, each owned by different teams, the end-to-end repro is not sufficient for us to move forward. If that is all that you've got, I'd suggest transferring this over to the Functions team folks who own the Python worker. They can start peeling back the layers of the onion and pass the necessary details to the folks that own the Functions runtime, the isolated worker package, and then back to us if related to the bindings.

github-actions[bot] commented 4 months ago

Hi @hallvictoria. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] commented 3 months ago

Hi @hallvictoria, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

JonasEichhorn commented 3 months ago

Is there any new information on this?

jsquire commented 3 months ago

What you see here is the current status. Any investigation or action on the part of the Azure SDK extensions package is blocked waiting on information from @hallvictoria, as was requested above. At present, it is unclear whether this is related to something in the Functions infrastructure or the extension.

JonasEichhorn commented 3 months ago

Thanks!

hallvictoria commented 3 months ago

Closing as this is due to a limitation in manually testing non-HTTP functions. Interacting with prod resources works as expected, and this isn't a bug in the worker or SDK side.

JonasEichhorn commented 3 months ago

@jsquire Does this information help you checking out the issue?

jsquire commented 3 months ago

@JonasEichhorn: Please see the note from Victoria above; the root cause of this is external to the extensions package and was related to Functions infrastructure.