Azure / azure-functions-eventhubs-extension

Event Hubs extension for Azure Functions
MIT License
20 stars 26 forks source link

Functions v2 using EventHubs and FunctionsStartup does not use batch size config value from host.json #41

Closed Joehannus closed 2 months ago

Joehannus commented 4 years ago

I am currently trying to maximize the number of messages an Azure Function is processing in one "run" using a Event Hub Trigger.

The settings in the host.json file are as follows for this:

"extensions": {
    "eventHubs": {
            "eventProcessorOptions": {
                "maxBatchSize": 64,
                "prefetchCount": 128
             }
        }
    },

As is mentioned @lukasvosyka in Azure/azure-functions-host#4480 maxBatchSize is not used at all from the host.json. The reasons for this were among others that he used the Willezone DI libary. We have been doing that as well. Yesterday I replaced the Willezone DI library by the Microsoft default one for DI. I also updated most of the packages to the highest minor versions.

We're using following versions of packages:

We were also using a ConfigurationBuilder in our Startup class and I know this can give issues with the other "configuration" services or containers being overwritten. But I found a workaround for that here: Azure/azure-functions-host#4464 (to be precise - the solution provided by @martinfletcher (https://github.com/Azure/azure-functions-host/issues/4464#issuecomment-553132829). In debug mode I can see that multiple containers are loaded.

I put the logging level for the host to trace, and I can see that the valuse for the eventProcessorOptions are set according to the host.json.

2020-01-15T09:11:08.814 [Information] EventHubOptions

{
    "BatchCheckpointFrequency": 1,
    "EventProcessorOptions": {
    "EnableReceiverRuntimeMetric": false,
    "InvokeProcessorAfterReceiveTimeout": false,
        "MaxBatchSize": 64,
        "PrefetchCount": 128,
        "ReceiveTimeout": "00:01:00"
    },
    "PartitionManagerOptions": {
        "LeaseDuration": "00:00:30",
        "RenewInterval": "00:00:10"
    }
}

However - the number of messages received in an array frequently go over 64 debugging with this change locally in Visual Studio. Shouldn't the number of messages in array not go over the set amount because the EventProcessor manages the number of messages that is "sent" to each runtime instance of the function?

Joehannus commented 4 years ago

Is anybody looking into this?

Some information that might help - the event hub we get our data from, gets its data from a Stream Analytics job, in which the data are outputted in an array. Our assumption is that an array of data with multiple messages in that array is interpreted as one single message. And thus the maxBatchSize is not triggered when such an array contains a lot of messages.

rkbalu commented 3 years ago

@Joehannus - I was just reading your above comment. I'm not sure if this ticket is still a problem for your or not. But, yes, your assumption is correct. EH Trigger does not know how many message is wrapped per single message.