Azure / azure-functions-kafka-extension

Kafka extension for Azure Functions
MIT License
113 stars 77 forks source link

Cannot find librdkafka when hosting in Azure #123

Open FlipABit opened 4 years ago

FlipABit commented 4 years ago

Locally the extension runs fine, but when hosting in Azure I get the error below. I'm using release 1.0.2 and hosting in a Linux premium function tier with the Node runtime.

Edit: Also occurs on a Windows-based Premium Function App

Microsoft.Azure.WebJobs.Host.Listeners.FunctionListenerException: The listener for function 'Functions.createEvents' was unable to start.
 ---> System.DllNotFoundException: Failed to load the librdkafka native library.
   at Confluent.Kafka.Impl.Librdkafka.Initialize(String userSpecifiedPath)
   at Confluent.Kafka.Consumer`2..ctor(ConsumerBuilder`2 builder)
   at Confluent.Kafka.ConsumerBuilder`2.Build()
   at Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaListener`2.StartAsync(CancellationToken cancellationToken) in /home/vsts/work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Listeners/KafkaListener.cs:line 99
   at Microsoft.Azure.WebJobs.Host.Listeners.FunctionListener.StartAsync(CancellationToken cancellationToken, Boolean allowRetry) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Listeners\FunctionListener.cs:line 68
   --- End of inner exception stack trace ---
fbeltrao commented 4 years ago

@FlipABit thanks for reaching out. We are looking into some options to address this problem. I will keep you posted

fbeltrao commented 4 years ago

We are still investigating into it. A different way to address the problem is to use containerized functions (https://docs.microsoft.com/azure/azure-functions/functions-create-function-linux-custom-image?tabs=portal%2Cbash&pivots=programming-language-typescript) which are supported in premium.

There is an open PR adding a Typescript sample (pay attention to the dockerfile in which libradkafka gets installed).

Please let us know if that works for you.

FlipABit commented 4 years ago

I was able to get it running containerized locally, but to host a containerized function in Azure it requires using Linux servers, and Azure is currently unable to mix Windows and Linux servers in the same resource group/region, which is a requirement for me.

I got slightly closer to getting this working. My extensions.csproj:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>netstandard2.0</TargetFramework>
  <WarningsAsErrors></WarningsAsErrors>
  <DefaultItemExcludes>**</DefaultItemExcludes>

  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Microsoft.Azure.WebJobs.Script.ExtensionsMetadataGenerator" Version="1.1.0" />
    <PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Kafka" Version="1.0.2-alpha" />
    <PackageReference Include="librdkafka.redist" Version="1.3.0" />
  </ItemGroup>
</Project>

This installs the librdkafka DLL. Locally it's installed in bin/librdkafka/[x86|x64]. When deployed to the remote function app, it's looking for a "runtimes" folder with win- prefixed to the arch as well as a native folder: D:\home\site\wwwroot\runtimes\win-x64\native\librdkafka.dll. Seems it's related to this previous issue/solution, but building the path to look for the dll is pretty brittle unless there's a way to output the dll from the nuget config to the specific directory it's looking for (I'm not very familiar with the extensions.csproj configuration options).

fbeltrao commented 4 years ago

@FlipABit thanks for the update. You shouldn't have to add the librdkafka.redist references on your side. The files are supposed to be already in the "runtimes" folder.

The issue we are having is that we are unable to load the library, as it seems that some dependencies are missing in the image we use to run the function.

fbeltrao commented 4 years ago

@FlipABit could you please try to add the following configuration setting into your function app: LD_LIBRARY_PATH=/home/site/wwwroot/bin/runtimes/linux-x64/native

FlipABit commented 4 years ago

@fbeltrao We've switched to Windows hosted premium function apps to be able to use VNet capabilities (which is only available for preview in Linux and not supported in production workloads). Is there an Windows-based equivalent environment variable I can try?

I did spin up a linux function app in a different region just to test using the LD_LIBRARY_PATH setting, but it still didn't seem to work.

fbeltrao commented 4 years ago

Thanks for the update. I cannot reproduce the problem. There must be a difference in our configuration that is causing the issue.

I suggest we try to sort this out over a call. If you are ok with this please reach me out at frbeltra at microsoft dot com so we can schedule it.

erjok commented 4 years ago

Same here - when I copy-paste Kafka trigger sample and run locally with Azure Emulator I get System.DllNotFoundException: Failed to load the librdkafka native library:

   at Confluent.Kafka.Impl.Librdkafka.Initialize(String userSpecifiedPath)
   at Confluent.Kafka.Consumer`2..ctor(ConsumerBuilder`2 builder)
   at Confluent.Kafka.ConsumerBuilder`2.Build()
   at Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaListener`2.CreateConsumer() in /home/vsts/work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Listeners/KafkaListener.cs:line 107
   at Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaListener`2.<.ctor>b__23_0() in /home/vsts/work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Listeners/KafkaListener.cs:line 79
   at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)

Version: 2.0.0-beta

FlipABit commented 4 years ago

I was finally able to revisit this after getting pulled off to some other work. I'll start with the TL;DR: I was able to get the extension working on both Windows and Linux non-containerized function app environments, but with a few small tweaks.

For Windows deployment: I used my existing code that was previously failing and updated the extensions.csproj to use the 2.0.0-beta release. Still no luck there. I ended up using a fresh copy of the new javascript sample in this repo and got that deployed and working with minimal changes. The global.json file was causing an error when doing the initial extension install locally:

> npm run-script extensions:install

Can't determine project language from files. Please use one of [--csharp, --javascript, --typescript, --java, --python, --powershell]
Can't determine project language from files. Please use one of [--csharp, --javascript, --typescript, --java, --python, --powershell]
  3.1.102 [/usr/local/share/dotnet/sdk]
A compatible installed .NET Core SDK for global.json version [3.1.201] from [/Users/redacted/src/github/azure-functions-kafka-js-sample/global.json] was not found
Install the [3.1.201] .NET Core SDK or update [/Users/redacted/src/github/azure-functions-kafka-js-sample/global.json] with an installed .NET Core SDK:

Removing global.json allowed the extension to install and run locally. Putting it back afterwards didn't cause any further issues. Both locally and deployed into Azure, the sample javascript project seemed to work. I went back to my original typescript project and and copied the extensions.csproj file from the working sample and that seemed to be the only change required to do the trick. TargetFramework was netstandard2.0 and the working version uses netcoreapp3.1. Microsoft.Azure.WebJobs.Script.ExtensionsMetadataGenerator was 1.1.0 and the working version uses 1.1.7. I'm guessing it was the TargetFramework that was my problem.

For Linux deployment: I took the code that I got working and deployed it to a Linux-based function app environment but still got the "DLL not found" issue. Going back in this thread, I added the LD_LIBRARY_PATH configuration variable as specified and that seemed to make it work. The logs no longer show an exception being thrown, though it does still show this line:

2020-04-29T20:22:48.294017599Z info: Host.Triggers.Kafka[0]
2020-04-29T20:22:48.294037100Z       Librdkafka initialization: running in non-Windows OS, expecting librdkafka to be there
2020-04-29T20:22:48.295002450Z info: Host.Triggers.Kafka[0]
2020-04-29T20:22:48.295019351Z       Librdkafka initialization: could not find dll location
20

It doesn't seem to prevent it from continuing on and attempting to connect to the broker though.

ryancrawcour commented 4 years ago

So @FlipABit does it work now with the configuration above? Are you able to receive messages and trigger a Function?

TsuyoshiUshio commented 4 years ago

Hi @FlipABit , You seem solve the problem. I provide an updated version of the sample. So I close this issue. If you don't solve this, feel free to reopen this issue.

The PR of the new sample is in here. It will be merged soon. https://github.com/Azure/azure-functions-kafka-extension/pull/140

shaunco commented 4 years ago

@TsuyoshiUshio - This is still pretty easy to reproduce:

func init LocalFunctionsProject --worker-runtime dotnet --docker
cd LocalFunctionsProject
func new --name KafkaExample --template "Service Bus Topic trigger"

Then:

  1. Change the trigger attributes in KafkaExample from Service Bus to this Kafka library.
  2. Add ENV AZURE_FUNCTIONS_ENVIRONMENT=Development to the Dockerfile
  3. Add logging.logLevel.default="Information" to host.json
  4. (yes, you can add ENV LD_LIBRARY_PATH=/home/site/wwwroot/bin/runtimes/linux-x64/native to Dockerfile but that is a workaround that isn't documented anywhere except this Issue - so don't add it)
docker build --tag mykafkatest .
docker run -p 8080:80 -it mykafkatest

Notice the error:

Hosting environment: Development
Content root path: /
Now listening on: http://[::]:80
Application started. Press Ctrl+C to shut down.
fail: Host.Startup[515]
      A host error has occurred during startup operation '81755f4b-0519-4f2c-890b-3a2e562c5896'.
System.DllNotFoundException: Failed to load the librdkafka native library.
   at Confluent.Kafka.Impl.Librdkafka.Initialize(String userSpecifiedPath)
   at Confluent.Kafka.Consumer`2..ctor(ConsumerBuilder`2 builder)
   at Confluent.Kafka.ConsumerBuilder`2.Build()
   at Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaListener`2.CreateConsumer() in /home/vsts/work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Listeners/KafkaListener.cs:line 107
   at Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaListener`2.<.ctor>b__23_0() in /home/vsts/work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Listeners/KafkaListener.cs:line 79
   at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)
   at System.Lazy`1.ExecutionAndPublication(LazyHelper executionAndPublication, Boolean useDefaultConstructor)
   at System.Lazy`1.CreateValue()
   at System.Lazy`1.get_Value()
   at Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaListener`2.CreateTopicScaler() in /home/vsts/work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Listeners/KafkaListener.cs:line 112
   at Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaListener`2.<.ctor>b__23_1() in /home/vsts/work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Listeners/KafkaListener.cs:line 80
   at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)
   at System.Lazy`1.ExecutionAndPublication(LazyHelper executionAndPublication, Boolean useDefaultConstructor)
   at System.Lazy`1.CreateValue()
   at System.Lazy`1.get_Value()
   at Microsoft.Azure.WebJobs.Extensions.Kafka.KafkaListener`2.GetMonitor() in /home/vsts/work/1/s/src/Microsoft.Azure.WebJobs.Extensions.Kafka/Listeners/KafkaListener.cs:line 393
   at Microsoft.Azure.WebJobs.Host.Listeners.HostListenerFactory.RegisterScaleMonitor(IListener listener, IScaleMonitorManager monitorManager) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Listeners\HostListenerFactory.cs:line 113
   at Microsoft.Azure.WebJobs.Host.Listeners.HostListenerFactory.CreateAsync(CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Listeners\HostListenerFactory.cs:line 76
   at Microsoft.Azure.WebJobs.Host.Listeners.ListenerFactoryListener.StartAsyncCore(CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Listeners\ListenerFactoryListener.cs:line 45
   at Microsoft.Azure.WebJobs.Host.Listeners.ShutdownListener.StartAsync(CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Listeners\ShutdownListener.cs:line 29
   at Microsoft.Azure.WebJobs.JobHost.StartAsyncCore(CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\JobHost.cs:line 103
   at Microsoft.Azure.WebJobs.Script.ScriptHost.StartAsyncCore(CancellationToken cancellationToken) in /src/azure-functions-host/src/WebJobs.Script/Host/ScriptHost.cs:line 261  
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.WebJobs.Script.WebHost.WebJobsScriptHostService.UnsynchronizedStartHostAsync(ScriptHostStartupOperation activeOperation, Int32 attemptCount, JobHostStartupMode startupMode) in /src/azure-functions-host/src/WebJobs.Script.WebHost/WebJobsScriptHostService.cs:line 273

While I get that LD_LIBRARY_PATH is a workaround, it is not documented anywhere other than this issue. Further, any deployment script for Azure needs to also set this... which means it is pretty fragile.

If the Kafka extension is dynamically loading this, it really should be responsible for setting LD_LIBRARY_PATH within the process to include the runtimes/{platform}/native path prior to attempting dlopen - or better yet, pass a full path to dlopen.

ryancrawcour commented 4 years ago

thanks @shaunco. agree, this should be resolved. i've reopened the issue.

TsuyoshiUshio commented 4 years ago

Hi @shaunco
Sorry, it should be documented. Unfortunately, current func tool doesn't support Kafka workload. LD_LIBRARY_PATH is required in this case. Current documentation mentions for Linux Premium that we currently officially support. https://github.com/Azure/azure-functions-kafka-extension#linux-premium-plan-configuration Which environment do you use? Linux with AppService Plan or consumption? I might need to write about it. Thank you for reopen it.

TsuyoshiUshio commented 4 years ago

Hi @shaunco and @ryancrawcour

However, I'll investigate the architecture without setting it for linux. (That is only requied for linux). It might not high priority, however, I'll consider it.

shaunco commented 4 years ago

Current documentation mentions for Linux Premium that we currently officially support. https://github.com/Azure/azure-functions-kafka-extension#linux-premium-plan-configuration Which environment do you use? Linux with AppService Plan or consumption? I might need to write about it. Thank you for reopen it.

Thanks, I hadn't seen that particular wiki page, but good to know it is documented outside of this issue.

However, I'll investigate the architecture without setting it for linux. (That is only requied for linux). It might not high priority, however, I'll consider it.

I'm not clear on what you are investigating outside of linux, as Windows doesn't need the LD_LIBRARY_PATH environment variable?

shrohilla commented 1 year ago

We fixed this issue for Linux environment in 3.4.0 release. Feel free to reopen if this issue persists.

danbasszeti commented 10 months ago

Hi, I'm having this issue again. Linux consumption plan, dotnet6.0 and Azure Functions v4, Version="3.4.0", comes up with a can't find librdkafka error. I checked the build outputs and its only outputting windows folders into the runtimes folder that the LD_PATH variable is pointing at (and I've confirmed if I switch to a windows function app it works).

image
danbasszeti commented 10 months ago

OK after some more investigation I've narrowed it down to the libraries being placed into runtimes/ rather than bin/runtimes. They therefore get dropped on deploy by the Azure Functions deploy mechanism. Here's a simplified image of the result from dotnet build --configuration Release --output ./output

Screenshot 2023-10-24 at 08 37 04

To fix this I've had to put a cp -rf runtimes/* bin/runtimes in my GitHub Action before deployment. Is there something I'm doing wrong as this feels a very fragile solution?

jainharsh98 commented 10 months ago

Can you please confirm if the issue exists with the latest Kafka Extension version 3.9.0?