Azure / azure-iot-sdk-csharp

A C# SDK for connecting devices to Microsoft Azure IoT services
Other
466 stars 494 forks source link

DeviceClient/ModuleClient .OpenAsync Timeouts/Delays with .NET Core 2.1 #521

Closed jason-e-gross closed 6 years ago

jason-e-gross commented 6 years ago

Description of the issue:

When updating the IOT Edge project to use .NET Core 2.1, it introduces delays and timeouts in devices or modules at .OpenAsyc. About 30% timeouts to crashing (which edgeAgent restarts the module) to 70% delays of over 20-30 seconds or more.

Code sample exhibiting the issue:

<TargetFramework>netcoreapp2.1</TargetFramework>
<PackageReference Include="Microsoft.AspNetCore.App" Version="2.1.0" />

It shows in either:

<PackageReference Include="Microsoft.Azure.Devices.Client" Version="1.17.0-preview-007" />

or

<PackageReference Include="Microsoft.Azure.Devices.Client" Version="1.6.0-preview-001" />

If we downgrade back to .NET Core App 2.0, the timeout/delay goes away, and .OpenAsync is as quick as you would expect it to be. It exhibits itself in three modules we tried updating - which range from nothing more special than the FilterModule example to one that uses Kestrel/SignalR (port 8012).

Updating to the 1.17.0-preview-007 has made it less likely to timeout and crash the module, but it still takes over 20 seconds. An example of a timeout is below (4 mins passed before it crashed). Includes a .ContinueWith to capture the stack trace.

06/15/2018 14:02:22 | INIT: Opening ModuleClient
06/15/2018 14:06:23 | INIT: ModuleClient Initialized, CW: {"Id":1,"Exception":{"ClassName":"System.AggregateException","Message":"One or more errors occurred.","Data":null,"InnerException":{"ClassName":"System.TimeoutException","Message":"Operation timeout expired.","Data":null,"InnerException":null,"HelpURL":null,"StackTraceString":"   at Microsoft.Azure.Devices.Client.InternalClient.<>c.<ApplyTimeout>b__62_2(Task t)\n   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)\n--- End of stack trace from previous location where exception was thrown ---\n   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233083,"Source":"Microsoft.Azure.Devices.Client","WatsonBuckets":null},"HelpURL":null,"StackTraceString":null,"RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233088,"Source":null,"WatsonBuckets":null,"InnerExceptions":[{"ClassName":"System.TimeoutException","Message":"Operation timeout expired.","Data":null,"InnerException":null,"HelpURL":null,"StackTraceString":"   at Microsoft.Azure.Devices.Client.InternalClient.<>c.<ApplyTimeout>b__62_2(Task t)\n   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)\n--- End of stack trace from previous location where exception was thrown ---\n   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233083,"Source":"Microsoft.Azure.Devices.Client","WatsonBuckets":null}]},"Status":7,"IsCanceled":false,"IsCompleted":true,"IsCompletedSuccessfully":false,"CreationOptions":0,"AsyncState":null,"IsFaulted":true}
06/15/2018 14:06:23 | INIT: DeviceClient StateChange Handler Enabled

Startup.cs

        public void ConfigureServices(IServiceCollection services)  {
            services.AddMvc();
            services.AddCors();
            services.AddSignalR();
        }

        // This method gets called by the runtime. Use this method to configure the HTTP request pipeline.
        public void Configure(IApplicationBuilder app, IHostingEnvironment env) {
            if (env.IsDevelopment())    app.UseDeveloperExceptionPage();

            app.UseFileServer();
            app.UseStaticFiles(
                new StaticFileOptions    {
                    FileProvider = new PhysicalFileProvider(Path.Combine(Directory.GetCurrentDirectory(), "resources")),    //physical
                    RequestPath = "/resources"                                                                              //pseudo
                }
            );
            app.UseCookiePolicy();
            app.UseCors(builder =>
                builder
                    .AllowAnyHeader()
                    .AllowAnyMethod()
                    .AllowAnyOrigin()
            );
            app.UseSignalR(routes => {routes.MapHub<UserInterfaceHub>("/uihub");});
            app.UseMvc();
        }

relevant Program.cs

            moduleClient = ModuleClient.CreateFromConnectionString(connectionString, settings);
            await moduleClient.OpenAsync().ContinueWith(_result =>
                Console.WriteLine($"{DateTime.Now.ToString()} | INIT: ModuleClient Initialized, CW: {JsonConvert.SerializeObject(_result)}")
             );
            moduleClient.SetConnectionStatusChangesHandler(onConnectionChange);
CIPop commented 6 years ago

@jason-e-gross , we are aware of the issue. Could you give this a try with yesterday's GA release (1.17.0) please? It could be related to #352 or #495 which we fixed only in GA bits.

CIPop commented 6 years ago

/cc @varunpuranik

jason-e-gross commented 6 years ago

@CIPop -- I've been able to reduce it down to:

  1. It happens with MQTT on the old runtime, if module project is .NET Core 2.1
  2. It happens with MQTT on the new GA runtime, if module project is .NET Core 2.0 and 2.1

If I switch to AMQP transport settings, there's zero delay in .OpenAsync (2.0 and 2.1), works as expected.

CIPop commented 6 years ago

Thanks @jason-e-gross for the new info! I've moved all our testing to netcoreapp2.1 and didn't see any timing issues yet - this is strictly related to the network configuration in edge / Docker. I'm continuing the investigation.

myagley commented 6 years ago

We believe this issue has been fixed in DotNetty here: https://github.com/Azure/DotNetty/pull/413

We need to cut a new release of DotNetty and update the SDK with this new version.

myagley commented 6 years ago

Version 1.18.0 of the C# SDK has been released to nuget.org. Please give it a try.

isimanovich commented 6 years ago

Any update on when new version of DotNetty will be available?

varunpuranik commented 6 years ago

@isimanovich - the new version 1.18.0 of the C# SDK has already been released. Can you check if if you still run into this issue? @jason-e-gross - can you also try C# SDK version 1.18.0? It has the fix for this issue. If it works for you, can you please close this issue?

isimanovich commented 6 years ago

@varunpuranik I was referring to DotNetty package that @myagley mentioned on Aug10 ^^^. However, I get following error message after exactly 1 hr and 5 minutes of DeviceClient being used to send messages to IoTHub:

Exception with message: I/O error occurred., Exception : DotNetty.Transport.Channels.ClosedChannelException: I/O error occurred. at Microsoft.Azure.Devices.Client.InternalClient.<>c.<ApplyTimeoutMessage>b__63_2(Task'1 t) at System.Threading.Tasks.ContinuationResultTaskFromResultTask'2.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter1.GetResult()`

varunpuranik commented 6 years ago

@isimanovich - Dotnetty version 0.5.0 which includes the fix has also been released.

As for the exception, is it a timeout (you included the callstack but not the exception)?

isimanovich commented 6 years ago

i think, i found root-cause of my issue

prmathur-microsoft commented 6 years ago

@isimanovich - Could you update us with your findings ? Are you unblocked?

CIPop commented 6 years ago

We are very interested in your findings - they may help other users of the SDK as well so if you have some time, please add them to this thread.

Closing for now. @isimanovich please follow https://github.com/Azure/azure-iot-sdk-csharp/tree/master/tools/CaptureLogs to collect logs when you're getting this exception with the latest release.

az-iot-builder-01 commented 6 years ago

@jason-e-gross, @myagley, @isimanovich, @prmathur-microsoft, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey