dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.1k stars 2.03k forks source link

NullReferenceException in OutsideRuntimeClient #8385

Open D-McInnes8 opened 1 year ago

D-McInnes8 commented 1 year ago

We've got an Orleans Silo running in Azure in a Linux container app, with a client Azure function app that's pulling from a queue to process data. This runs fine if messages are processed one at a time, but when multiple messages are processed at the same time and the function app is trying to send messages to the same grain concurrently, we get the following exception:

Exception while executing function: Functions.IngestPricesTrigger Result: Failure Exception: System.AggregateException: One or more errors occurred. (Object reference not set to an instance of an object.) ---> System.NullReferenceException: Object reference not set to an instance of an object. at Orleans.OutsideRuntimeClient.SendRequestMessage(GrainReference target, Message message, IResponseCompletionSource context, InvokeMethodOptions options) in //src/Orleans.Core/Runtime/OutsideRuntimeClient.cs:line 244 at Orleans.OutsideRuntimeClient.SendRequest(GrainReference target, IInvokable request, IResponseCompletionSource context, InvokeMethodOptions options) in //src/Orleans.Core/Runtime/OutsideRuntimeClient.cs:line 236 at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in //src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 45 at Orleans.Runtime.GrainReference.InvokeAsync[T](IInvokable methodDescription) in //src/Orleans.Core.Abstractions/Runtime/GrainReference.cs:line 413 at OrleansCodeGen.Engine.GrainInterfaces.Proxy_IPriceSourceValidationGrain.global::Engine.GrainInterfaces.IPriceSourceValidationGrain.GetPriceSourceValidationRules() in C:\agents\03_work\4\s\Engine.GrainInterfaces\Orleans.CodeGenerator\Orleans.CodeGenerator.OrleansSerializationSourceGenerator\Engine.GrainInterfaces.orleans.g.cs:line 415 at Engine.Data.Orleans.Repositories.OrleansPriceSourceRepository.GetPriceSourceValidationRules(Int32 priceSourceId) in C:\agents\03_work\4\s\Engine.Data.Orleans\Repositories\OrleansPriceSourceRepository.cs:line 44 at Engine.Logic.Implementations.PriceValidationService.ValidateUnprocessedPrices(UnprocessedPrice[] prices, Int32 priceSourceId) in C:\agents\03_work\4\s\Engine.Logic\Implementations\PriceValidationService.cs:line 28 at Engine.Logic.Implementations.PricingService.ProcessBatchPrices(UnprocessedPrice[] unprocessedPrices, Int32 priceSourceId, DateTime receviedDate, Nullable1 reportedDate, String fileName, Nullable1 userId, Guid processGuid) in C:\agents\03_work\4\s\Engine.Logic\Implementations\PricingService.cs:line 40 at Engine.Functions.Triggers.IngestPricesTrigger.Run(IngestPriceBatchCommand payload) in C:\agents\03_work\4\s\Engine.Functions\Triggers\IngestPricesTrigger.cs:line 26 at Microsoft.Azure.Functions.Worker.Invocation.VoidTaskMethodInvoker2.InvokeAsync(TReflected instance, Object[] arguments) in D:\a\_work\1\s\src\DotNetWorker.Core\Invocation\VoidTaskMethodInvoker.cs:line 22 --- End of inner exception stack trace --- at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at System.Threading.Tasks.Task1.get_Result() at Microsoft.Azure.Functions.Worker.Invocation.DefaultFunctionInvoker2.<>c.b6_0(Task1 t) in D:\a\_work\1\s\src\DotNetWorker.Core\Invocation\DefaultFunctionInvoker.cs:line 32 at System.Threading.Tasks.ContinuationResultTaskFromResultTask2.InnerInvoke() at System.Threading.Tasks.Task.<>c.<.cctor>b273_0(Object obj) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location --- at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) --- End of stack trace from previous location --- at Microsoft.Azure.Functions.Worker.Invocation.DefaultFunctionExecutor.ExecuteAsync(FunctionContext context) in D:\a_work\1\s\src\DotNetWorker.Core\Invocation\DefaultFunctionExecutor.cs:line 44 at Microsoft.Azure.Functions.Worker.OutputBindings.OutputBindingsMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a_work\1\s\src\DotNetWorker.Core\OutputBindings\OutputBindingsMiddleware.cs:line 13 at Microsoft.Azure.Functions.Worker.GrpcWorker.InvocationRequestHandlerAsync(InvocationRequest request, IFunctionsApplication application, IInvocationFeaturesFactory invocationFeaturesFactory, ObjectSerializer serializer, IOutputBindingsInfoProvider outputBindingsInfoProvider, IInputConversionFeatureProvider functionInputConversionFeatureProvider) in D:\a_work\1\s\src\DotNetWorker.Grpc\GrpcWorker.cs:line 199 Stack: at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at System.Threading.Tasks.Task1.get_Result() at Microsoft.Azure.Functions.Worker.Invocation.DefaultFunctionInvoker2.<>c.<InvokeAsync>b__6_0(Task1 t) in D:\a_work\1\s\src\DotNetWorker.Core\Invocation\DefaultFunctionInvoker.cs:line 32 at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke() at System.Threading.Tasks.Task.<>c.<.cctor>b__273_0(Object obj) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location --- at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) --- End of stack trace from previous location --- at Microsoft.Azure.Functions.Worker.Invocation.DefaultFunctionExecutor.ExecuteAsync(FunctionContext context) in D:\a_work\1\s\src\DotNetWorker.Core\Invocation\DefaultFunctionExecutor.cs:line 44 at Microsoft.Azure.Functions.Worker.OutputBindings.OutputBindingsMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a_work\1\s\src\DotNetWorker.Core\OutputBindings\OutputBindingsMiddleware.cs:line 13 at Microsoft.Azure.Functions.Worker.GrpcWorker.InvocationRequestHandlerAsync(InvocationRequest request, IFunctionsApplication application, IInvocationFeaturesFactory invocationFeaturesFactory, ObjectSerializer serializer, IOutputBindingsInfoProvider outputBindingsInfoProvider, IInputConversionFeatureProvider functionInputConversionFeatureProvider) in D:\a_work\1\s\src\DotNetWorker.Grpc\GrpcWorker.cs:line 199

This is the code that's used to get the grain. PriceSourceId is never going to be null, so that's not the issue.

var priceSourceValidation = _orleansClusterClient.GetGrain<IPriceSourceValidationGrain>(priceSourceId);
return await priceSourceValidation.GetPriceSourceValidationRules();

We're using Azure table storage for the membership table.

ReubenBond commented 1 year ago

How are you configuring the client? The client is thread safe, but there's a possibility that the client hasn't fully started before being accessed.

D-McInnes8 commented 1 year ago

Thanks for the reply, this is the code we're using to configure the client:

.UseOrleansClient((context, builder) =>
{
    var siloConfig = context.Configuration.GetSection(nameof(SiloSettings)).Get<SiloSettings>();
    builder
    .UseAzureStorageClustering(opt =>
    {
        opt.ConfigureTableServiceClient(context.Configuration.GetConnectionString("EngineIngestionStorageAccount"));
    })
    .Configure<ClusterOptions>(o =>
    {
        o.ClusterId = siloConfig.ClusterId;
        o.ServiceId = siloConfig.ServiceId;
    });
})

We'll do some testing and see if this is something that's only occurring when the client is starting up.

D-McInnes8 commented 1 year ago

We've done more testing and only managed to replicate the issue when the function app is started / restarted, so it looks like it is an issue with the startup.

I've tried a work around that appears to be working, by using a retry policy on the Azure function trigger and increasing the delay so that the Orleans client has enough time to start. It would be useful though if there was some way to ensure that the Azure service bus trigger is only executed once the Orleans client has started, so we didn't have to rely on retries.

fwaris commented 1 year ago

I ran into the same issue. I am using local clustering (for now).

In my case, the access to the Orleans cluster is via a hosted service running on an external client (asp.net host).

It turns out the order of service injection on the client matters. I solved this issue but inserting the hosted service after Orleans client configuration.

Before (did not work):

Host
    .CreateDefaultBuilder(args)
    .ConfigureWebHostDefaults(fun wb -> 
        wb
            .UseStaticWebAssets()
            .UseStartup<Startup>() //<-- Service that references Grain injected here
            |> ignore
    )
    .UseOrleansClient(fun oc -> 
        oc
            .UseLocalhostClustering()
            .AddMemoryStreams(AmfGrains.C.Streams.PROVIDER)                    
        |> ignore
        )
    .UseConsoleLifetime()
    .Build()
    .Run()

Fix:

Host
    .CreateDefaultBuilder(args)
    .UseOrleansClient(fun oc -> 
        oc
            .UseLocalhostClustering()
            .AddMemoryStreams(AmfGrains.C.Streams.PROVIDER)                    
        |> ignore
        )
    .ConfigureWebHostDefaults(fun wb -> 
        wb
            .UseStaticWebAssets()
            .UseStartup<Startup>() //<-- Service that references Grain injected here            
            |> ignore
    )
    .UseConsoleLifetime()
    .Build()
    .Run()
Cotspheer commented 2 months ago

Holy cow! Pardon my language but @fwaris this was it. I just spent a whole day to figure out what I did wrong. And in fact this was it! Can we get a disclaimer or some banner in the docs that points that out? I checked the samples and at the Stocks sample and GPSTracker sample this is wired up as below but there is no comment that points that out and I've only checked the streaming sample as I was curious about that particular part. Never would I have searched for "AddHostedService" inside the samples.

My case is a simple console application to test something and I added a HostedService before the UseOrleansClient and asked myself why InternalGrainFactory always was null inside the ClusterClient.cs. Refactoring the code to add builder.Services.AddHostedService<Worker>(); after the builder.UseOrleansClient() call did fix it.

var builder = Host.CreateApplicationBuilder(args);

// builder.Services.AddHostedService<Worker>(); <--- Don't do this

builder.UseOrleansClient((clientBuilder) =>
{
    clientBuilder.Configure<ClusterOptions>(options =>
    {
        options.ClusterId = "Cluster";
        options.ServiceId = typeof(Program).FullName;
    });

    var clusterConnectionString = builder.Configuration["ConnectionStrings:ClusterDatabase"]!;

    clientBuilder.UseCosmosGatewayListProvider(options =>
    {
        options.ConfigureCosmosClient(clusterConnectionString);
        options.ClientOptions = new CosmosClientOptions
        {
            ServerCertificateCustomValidationCallback = (certificate, chain, sslPolicyErrors) => true,
        };
        options.DatabaseName = "cluster-db";
    });
});

builder.Services.AddHostedService<Worker>(); // It has do be after .UseOrleansClient

using var app = builder.Build();
await app.RunAsync();