Azure / azure-functions-python-worker

Python worker for Azure Functions.
http://aka.ms/azurefunctions
MIT License
335 stars 103 forks source link

[Bug] Testing Batched Event Hub Trigger #1524

Open JonasEichhorn opened 3 months ago

JonasEichhorn commented 3 months ago

Expected Behavior

  1. Triggering an event hub triggered function with cardinality MANY locally via HTTP should pass a list of EventHubEvent to the function.
  2. Type hints should not influence loading the azure functions.
  3. The new event hub emulator should work with the event hub trigger.

Actual Behavior

  1. Triggering an event hub triggered function with cardinality MANY locally via HTTP passes a single EventHubEvent to the function, but wraps the event body in a list.
    • Triggering the function with a request body {'input': {'some': 'awesome', 'event': 'body'}} gives an EventHubEvent with the body [{'some': 'awesome', 'event': 'body'}].
    • Posting multiple events {'input': [{'some': 'awesome', 'event': 'body'}, {'some': 'other', 'event': 'body'}]} gives a nested list [[{'some': 'awesome', 'event': 'body'}, {'some': 'other', 'event': 'body'}]]

I had to write a check to convert single events into a list in order to test my function locally and had to extract all the functionality in a seperate function which is called from my function so I can test it with batches. This solution is not pretty, but fine.

  1. Unfortunately, the type hinting breaks loading the azure function. I dont know why, but proper typing def batch_triggered(events: func.EventHubEvent | List[func.EventHubEvent]): prohibits the function from being loaded. It is found, but not loaded. Doing def batch_triggered(events: List[func.EventHubEvent]): or def batch_triggered(events: func.EventHubEvent): is fine, though.

  2. I also tried to use the new event hub emulator instead of triggering the function via HTTP for my tests, but the event hub always refuses the connection. I don't know, if this is an issue of the emulator or the worker. I'm raising this issue here, because the emulator does work with the current version of the python SDK.

Steps to Reproduce

HTTP-Trigger Issue

  1. Set cardinality of event hub triggered function (v2) to many.
  2. Trigger the function via POST request

Typing Issue

  1. Type hint event hub triggered function like def batch_triggered(events: func.EventHubEvent | List[func.EventHubEvent]):
  2. Start the python workers docker image

Event Hub Emulator Issue

  1. Write docker compose running an event hub emulator and the function worker docker image (https://learn.microsoft.com/en-us/azure/event-hubs/test-locally-with-event-hub-emulator?tabs=docker-linux-container#run-the-emulator)
  2. Add healthcheck to event hub, because function app starts way faster than the emulator and fails starting if emulator's not ready
  3. Make sure the event hub triggered function is not disabled.
  4. Set environment variable with emulators connection string (https://learn.microsoft.com/en-us/azure/event-hubs/test-locally-with-event-hub-emulator?tabs=docker-linux-container#interact-with-the-emulator). Pass the name of the environment variable as event hub triggers connection parameter.
  5. docker compose up. Function startup will fail, becasue event hub emulator refuses the connection
  6. Connect to the event hub via python SDK with the same connection string. Sending and receiving events will work.

Relevant code being tried

No response

Relevant log output

No response

requirements.txt file

No response

Where are you facing this problem?

Local - Core Tools

Function app name

No response

Additional Information

No response

bhagyshricompany commented 2 months ago

Thanks for informing will check and update you

hallvictoria commented 2 months ago

Hi @JonasEichhorn, thanks for reporting this issue.

HTTP-Trigger Issue Is this understanding correct with the given input? {"some": "awesome", "event": "body"}

After some investigation, this is what I found when the cardinality is set to many: Input data: {"some": "awesome", "event": "body"} Data received in worker: [{"some": "awesome", "event": "body"}]

Input data: [{"some": "awesome", "event": "body"}, {"some": "other", "event": "body"}] Data received in worker: [[{"some": "awesome", "event": "body"}, {"some": "other", "event": "body"}]]

In the python SDK: we load in the data, calculate list length, and then create that many EventHubEvents. Since here the length is one, only one EventHubEvent is created.

In summary, when the cardinality is set to many, the entire input body is wrapped in a list. This is done through the extension, not the worker. I can transfer this issue / create one over this with them.

Typing Issue: Yes, this is currently not supported. It should be though, so I'll cut a PR to get this fixed!

Emulator Issue: This looks to be a known issue in the emulator repo, hopefully these reponses can help!

JonasEichhorn commented 2 months ago

Thanks for looking into it! I haven't found the open emulator issue, when I was writing this. Thanks!

Sorry, for answering so late. I've been on vacation.

HTTP-Trigger Issue Is this understanding correct with the given input? {"some": "awesome", "event": "body"} - Current output: [{"some": "awesome", "event": "body"}] - Expected output: {["some": "awesome"], ["event": "body"]}

I would expect [{"some": "awesome", "event": "body"}], if I pass in a single dictionary. But I would expect a list of dictionaries, if I put in multiple dictionaries.

After some investigation, this is what I found when the cardinality is set to many: Input data: {"some": "awesome", "event": "body"} Data received in worker: [{"some": "awesome", "event": "body"}]

Input data: [{"some": "awesome", "event": "body"}, {"some": "other", "event": "body"}] Data received in worker: [[{"some": "awesome", "event": "body"}, {"some": "other", "event": "body"}]]

In the python SDK: we load in the data, calculate list length, and then create that many EventHubEvents. Since here the length is one, only one EventHubEvent is created.

In summary, when the cardinality is set to many, the entire input body is wrapped in a list. This is done through the extension, not the worker. I can transfer this issue / create one over this with them.

Thank you very much, for looking into this. It would be great, if you could transfer the issue or create a new issue for me. Thanks!

JonasEichhorn commented 2 months ago

@hallvictoria Can you give me the link to the typing-PR?

hallvictoria commented 2 months ago

You can track the EventHub cardinality issue here: https://github.com/Azure/azure-sdk-for-net/issues/45337. Please feel free to add more information if needed.

I will link the typing PR to this issue once ready. Thanks again for your patience!

JonasEichhorn commented 1 month ago

Thanks you very much, @hallvictoria!

hallvictoria commented 1 month ago

Hi @JonasEichhorn, thanks for your patience.

One follow-up question: how are you sending data to trigger the function?

I tried using an HTTP trigger POST request with an EventHub output binding, and I didn't get the nested list.

Input Data: [{"some": "awesome", "event": "body"}, {"one": "other", "event2": "body"}] Function App code:

# EventHub trigger (cardinality set to many, loops through list for processing)
@app.event_hub_message_trigger(arg_name="events",
                               event_hub_name="<name>",
                               connection="<connection_string>",
                               cardinality="many")
@app.blob_output(arg_name="$return",
                 path="<path>",
                 connection="<connection_string>")
def eventhub_trigger(events: typing.List[func.EventHubEvent]) -> bytes:
    event_list = []
    for event in events:
        event_dict: typing.Mapping[str, typing.Any] = {
            'body': event.get_body().decode('utf-8'),
        }
        event_list.append(event_dict)

    return json.dumps(event_list)

###################################################################

# EventHub Output
@app.function_name(name="eventhub_output_batch")
@app.event_hub_output(arg_name="$return",
                      connection="<connection_string>",
                      event_hub_name="<name>")
@app.route(route="eventhub_output_batch", binding_arg_name="out")
def eventhub_output_batch(req: func.HttpRequest, out: func.Out[str]) -> str:
    events = req.get_body().decode('utf-8')
    return events

I was able to repro the nested list issue, but only when using the Generate Data feature in the portal. Using an HTTP trigger (or any trigger, HTTP is not required) and output binding worked as expected.

JonasEichhorn commented 1 month ago

The problem arouse with event hub triggers. We start a function app locally with docker for testing. Everything works fine, if you read events from an event hub. However, it does not work, if you try to test the event hub triggered function without an event hub, but use a POST request to manually trigger non-HTTP-triggered functions (like so https://learn.microsoft.com/en-us/azure/azure-functions/functions-manually-run-non-http?tabs=azure-portal). Does that help?

Just for clarification, is this related to https://github.com/Azure/azure-sdk-for-net/issues/45337 or am I missing the scope of this issue and https://github.com/Azure/azure-sdk-for-net/issues/45337?

hallvictoria commented 1 month ago

Ah, I see now. Thanks for clarifying.

This way is for manual testing, which is different from interacting with prod resources. Complex data isn't supported with manual testing.

According to the documentation, the specific you supply depends on the type of trigger, but it can only be a string, numeric, or boolean value. When cardinality=many is set by the function app, however, the worker expects the input data to be in a list format. Because of this, whatever data is sent in the request body will be wrapped in a list. In this specific case, since the data was already in list format, it ended up being formatted into a nested list.

For manual testing, which is more limited compared to prod scenarios, I would suggest instead sending data in in one of the supported types.

It was related -- I was trying to determine under what conditions the nested list input data occurred. Since you have confirmed that interacting with prod resources works as expected and the issue occurs when doing manual testing, this looks to be a known limitation of manual testing and not necessarily an issue on the worker or sdk side.

JonasEichhorn commented 1 month ago

Thank you for looking into it. Are there any plans to make the event hub trigger testable with cardinality=many? We have to support processing single events and lists of events only to be able to test the system. That makes our code unnecessarily complicated and we can't test our code with production like data. As long as this is not supported, I think it would be a good idea to clarify in the documentation, that cardinality=many is not testable.

hallvictoria commented 1 month ago

For manually running non-HTTP functions, I'm not aware of any plans or timeline for supporting complex data types.

We do support unit testing for functions. It allows you to mock the function input and call the function directly, so this could provide an alternative way for you to test the functions.

I agree that the documentation isn't clear, so I'll see if we can get this information highlighted.

JonasEichhorn commented 1 month ago

Thanks!