Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
738 stars 493 forks source link

Queries with sync-over-async hang #1043

Closed lukasz-pyrzyk closed 4 years ago

lukasz-pyrzyk commented 4 years ago

Describe the bug I work on the migration from SDK v2 to SDK v3 in the ASP.NET web project which runs on the .NET Framework 4.6.1. Because of the contract of the 3rd party library, I'm forced to do sync-over-async calls.

After migrating to SDK v3, I notice that my app hangs on communication to the database. First query to the database completes, but any other hangs.

I was thinking that some `ConfigureAwait(false) is missing any my thread cannot continue the asynchronous operation, but it's added to all async calls.

Tested code on gateway and direct mode. Screenshots and call stack from direct mode below. Is it normal that is uses GatewayStoreClient under the hood? The name suggest that Direct mode uses gateway client.

To Reproduce I cannot give you repro right now, I will try to do it this week. Parts of the code:

class CosmosDbStatelessProvider
{
        public override bool AuthorizeServerRequest(Guid clientId, Guid domainKey, out Guid? sessionId, out Bindings? bindings, out Subscriptions? subscriptions)
        {
            var client = _repository.GetClient(clientId, domainKey).ConfigureAwait(false).GetAwaiter().GetResult();
            if (client is null)
            {
                [...]
                return false;
            }

            [..]
            return true;
        }
}

class CosmosDbRepository
{
        private readonly AsyncLazy<Container> _containerFactory;
        private readonly CosmosDBSettings _settings;
        private readonly StoredProcedures _sp;
        private readonly StoredProcedureExecutor _spExecutor;
        private readonly CosmosClient _databaseClient;

            _settings = settings;
            _sp = new StoredProcedures();
            var options = new CosmosClientOptions
            {
                ConnectionMode = _settings.UseDirectConnectionMode ? ConnectionMode.Direct : ConnectionMode.Gateway,
                ApplicationRegion = _settings.Region
            };

            _databaseClient = new CosmosClient(_settings.Endpoint, _settings.Key, options);
            _containerFactory = new AsyncLazy<Container>(async () =>
            {
                Logger.Debug("Preparing database, container and stored procedures");
                await CheckIfDatabaseExists(_databaseClient).ConfigureAwait(false);
                var container = await CheckIfContainerExists(_databaseClient).ConfigureAwait(false);
                if (_settings.PrepareStoredProcedures)
                {
                    await PrepareStoredProcedures(container).ConfigureAwait(false);
                }

                Logger.Debug("Database is ready");
                return container;
            });

            _spExecutor = new StoredProcedureExecutor(_containerFactory);

        internal async Task<ClientEntity?> GetClient(Guid clientId, Guid domainKey)
        {
            try
            {
                var dbClient = await _containerFactory;
                var partitionKey = new PartitionKey(clientId.ToString());
                var query = dbClient.GetItemLinqQueryable<ClientEntity>(requestOptions: new QueryRequestOptions { PartitionKey = partitionKey })
                        .Where(x => x.EntityType == EntityType.Client && x.ClientId == clientId && x.DomainKey == domainKey)
                        .ToFeedIterator();

                var client = await query.ExecuteFirstOrDefault(nameof(GetClient), Logger).ConfigureAwait(false);
                return client;
            }
            catch (CosmosException ex)
            {
                return null;
            }
        }
}

static class DocumentQueryExtensions 
{
        internal static async Task<TQueryItem> ExecuteFirstOrDefault<TQueryItem>(this FeedIterator<TQueryItem> iterator, string operationName, ILogger logger)
        {
            logger.Verbose("Executing {OperationName}", operationName);
            var feedResponse = await iterator.ReadNextAsync().ConfigureAwait(false);
            logger.Debug("{OperationName} finished with status code {StatusCode}. Request charge is {RequestCharge}.", operationName, feedResponse.StatusCode, feedResponse.RequestCharge);

            return feedResponse.FirstOrDefault();
        }

}

Expected behavior All DB call, even with sync-over-async should be succeeded. SDK v2 worked correctly.

Actual behavior Only the first query to the database completes, any other is hanging for infinite amount of time.

Environment summary SDK Version: 3.4.1 OS Version: Windows 10

Additional context parallel stacks

List of all tasks:

Not Flagged     803 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)  Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)
Not Flagged     800 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args) Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args)
Not Flagged     797 Awaiting    Microsoft.Azure.Documents.Rntbd.Dispatcher.OpenAsync.AnonymousMethod__23_0()    Microsoft.Azure.Documents.Rntbd.Dispatcher.OpenAsync.AnonymousMethod__23_0()
Not Flagged     796 Awaiting    Microsoft.Azure.Documents.Rntbd.Dispatcher.ReceiveLoopAsync()   Microsoft.Azure.Documents.Rntbd.Dispatcher.ReceiveLoopAsync()
Not Flagged     788 Awaiting    System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()   System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()
Not Flagged     787 Awaiting    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)
Not Flagged     774 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)  Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)
Not Flagged     772 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args) Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args)
Not Flagged     711 Awaiting    Microsoft.Azure.Cosmos.ComparableTaskScheduler.ScheduleAsync()  Microsoft.Azure.Cosmos.ComparableTaskScheduler.ScheduleAsync()
Not Flagged     710 Awaiting    Microsoft.Azure.Cosmos.Collections.Generic.AsyncCollection<T>.TakeAsync(token)  Microsoft.Azure.Cosmos.Collections.Generic.AsyncCollection<T>.TakeAsync(token)
Not Flagged     709 Awaiting    System.Threading.SemaphoreSlim.WaitUntilCountOrTimeoutAsync(asyncWaiter, millisecondsTimeout, cancellationToken)    System.Threading.SemaphoreSlim.WaitUntilCountOrTimeoutAsync(asyncWaiter, millisecondsTimeout, cancellationToken)
Not Flagged     708 Awaiting    System.Threading.Tasks.TaskFactory.CompleteOnInvokePromise.Invoke(completingTask)   System.Threading.Tasks.TaskFactory.CompleteOnInvokePromise.Invoke(completingTask)
Not Flagged     648 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)  Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)
Not Flagged     646 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args) Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args)
Not Flagged     636 Awaiting    Microsoft.Azure.Documents.Rntbd.Dispatcher.OpenAsync.AnonymousMethod__23_0()    Microsoft.Azure.Documents.Rntbd.Dispatcher.OpenAsync.AnonymousMethod__23_0()
Not Flagged     632 Awaiting    Microsoft.Azure.Documents.Rntbd.Dispatcher.ReceiveLoopAsync()   Microsoft.Azure.Documents.Rntbd.Dispatcher.ReceiveLoopAsync()
Not Flagged     627 Awaiting    System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()   System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()
Not Flagged     626 Awaiting    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)
Not Flagged     624 Awaiting    Microsoft.Azure.Documents.Rntbd.Dispatcher.OpenAsync.AnonymousMethod__23_0()    Microsoft.Azure.Documents.Rntbd.Dispatcher.OpenAsync.AnonymousMethod__23_0()
Not Flagged     623 Awaiting    Microsoft.Azure.Documents.Rntbd.Dispatcher.ReceiveLoopAsync()   Microsoft.Azure.Documents.Rntbd.Dispatcher.ReceiveLoopAsync()
Not Flagged     611 Awaiting    System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()   System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()
Not Flagged     610 Awaiting    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)
Not Flagged     527 Awaiting    Microsoft.Azure.Cosmos.ComparableTaskScheduler.ScheduleAsync()  Microsoft.Azure.Cosmos.ComparableTaskScheduler.ScheduleAsync()
Not Flagged     526 Awaiting    Microsoft.Azure.Cosmos.Collections.Generic.AsyncCollection<T>.TakeAsync(token)  Microsoft.Azure.Cosmos.Collections.Generic.AsyncCollection<T>.TakeAsync(token)
Not Flagged     525 Awaiting    System.Threading.SemaphoreSlim.WaitUntilCountOrTimeoutAsync(asyncWaiter, millisecondsTimeout, cancellationToken)    System.Threading.SemaphoreSlim.WaitUntilCountOrTimeoutAsync(asyncWaiter, millisecondsTimeout, cancellationToken)
Not Flagged     524 Awaiting    System.Threading.Tasks.TaskFactory.CompleteOnInvokePromise.Invoke(completingTask)   System.Threading.Tasks.TaskFactory.CompleteOnInvokePromise.Invoke(completingTask)
Not Flagged     469 Awaiting    Contoso.CacheProvider.Repository.CosmosDbRepository.GetClient(clientId, domainKey)  Contoso.CacheProvider.Repository.CosmosDbRepository.GetClient(clientId, domainKey)
Not Flagged     468 Awaiting    Contoso.CacheProvider.Repository.DocumentQueryExtensions.ExecuteFirstOrDefault<TQueryItem>(iterator, operationName, logger) Contoso.CacheProvider.Repository.DocumentQueryExtensions.ExecuteFirstOrDefault<TQueryItem>(iterator, operationName, logger)
Not Flagged     467 Awaiting    Microsoft.Azure.Cosmos.FeedIteratorCore<T>.ReadNextAsync(cancellationToken) Microsoft.Azure.Cosmos.FeedIteratorCore<T>.ReadNextAsync(cancellationToken)
Not Flagged     466 Awaiting    Microsoft.Azure.Cosmos.Query.QueryIterator.ReadNextAsync(cancellationToken) Microsoft.Azure.Cosmos.Query.QueryIterator.ReadNextAsync(cancellationToken)
Not Flagged     465 Awaiting    Microsoft.Azure.Cosmos.Query.CosmosQueryExecutionContextFactory.ExecuteNextAsync(cancellationToken) Microsoft.Azure.Cosmos.Query.CosmosQueryExecutionContextFactory.ExecuteNextAsync(cancellationToken)
Not Flagged     464 Awaiting    Microsoft.Azure.Cosmos.Query.CosmosQueryExecutionContextFactory.CreateItemQueryExecutionContextAsync(cancellationToken) Microsoft.Azure.Cosmos.Query.CosmosQueryExecutionContextFactory.CreateItemQueryExecutionContextAsync(cancellationToken)
Not Flagged     463 Awaiting    Microsoft.Azure.Cosmos.CosmosQueryClientCore.ExecuteQueryPlanRequestAsync(resourceUri, resourceType, operationType, sqlQuerySpec, supportedQueryFeatures, cancellationToken)    Microsoft.Azure.Cosmos.CosmosQueryClientCore.ExecuteQueryPlanRequestAsync(resourceUri, resourceType, operationType, sqlQuerySpec, supportedQueryFeatures, cancellationToken)
Not Flagged     462 Awaiting    Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler.SendAsync(resourceUri, resourceType, operationType, requestOptions, cosmosContainerCore, partitionKey, streamPayload, requestEnricher, cancellationToken) Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler.SendAsync(resourceUri, resourceType, operationType, requestOptions, cosmosContainerCore, partitionKey, streamPayload, requestEnricher, cancellationToken)
Not Flagged     461 Awaiting    Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler.SendAsync(request, cancellationToken) Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler.SendAsync(request, cancellationToken)
Not Flagged     460 Awaiting    Microsoft.Azure.Cosmos.Handlers.AbstractRetryHandler.SendAsync(request, cancellationToken)  Microsoft.Azure.Cosmos.Handlers.AbstractRetryHandler.SendAsync(request, cancellationToken)
Not Flagged     459 Awaiting    Microsoft.Azure.Cosmos.Handlers.AbstractRetryHandler.ExecuteHttpRequestAsync(callbackMethod, callShouldRetry, callShouldRetryException, cancellationToken)  Microsoft.Azure.Cosmos.Handlers.AbstractRetryHandler.ExecuteHttpRequestAsync(callbackMethod, callShouldRetry, callShouldRetryException, cancellationToken)
Not Flagged     458 Awaiting    Microsoft.Azure.Cosmos.Handlers.TransportHandler.SendAsync(request, cancellationToken)  Microsoft.Azure.Cosmos.Handlers.TransportHandler.SendAsync(request, cancellationToken)
Not Flagged     457 Awaiting    Microsoft.Azure.Cosmos.GatewayStoreModel.ProcessMessageAsync(request, cancellationToken)    Microsoft.Azure.Cosmos.GatewayStoreModel.ProcessMessageAsync(request, cancellationToken)
Not Flagged     456 Awaiting    Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeAsync(request, resourceType, physicalAddress, cancellationToken)    Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeAsync(request, resourceType, physicalAddress, cancellationToken)
Not Flagged     455 Awaiting    Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeClientAsync(request, resourceType, physicalAddress, cancellationToken)  Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeClientAsync(request, resourceType, physicalAddress, cancellationToken)
Not Flagged     454 Awaiting    Microsoft.Azure.Documents.BackoffRetryUtility<T>.ExecuteRetryAsync(callbackMethod, callShouldRetry, inBackoffAlternateCallbackMethod, minBackoffForInBackoffCallback, cancellationToken, preRetryCallback)  Microsoft.Azure.Documents.BackoffRetryUtility<T>.ExecuteRetryAsync(callbackMethod, callShouldRetry, inBackoffAlternateCallbackMethod, minBackoffForInBackoffCallback, cancellationToken, preRetryCallback)
Not Flagged     452 Awaiting    Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeClientAsync.AnonymousMethod__0()    Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeClientAsync.AnonymousMethod__0()
Not Flagged     451 Awaiting    System.Threading.Tasks.ContinuationTaskFromResultTask<TAntecedentResult>.InnerInvoke()  System.Threading.Tasks.ContinuationTaskFromResultTask<TAntecedentResult>.InnerInvoke()
Not Flagged     435 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)  Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)
Not Flagged     433 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args) Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args)
Not Flagged     430 Awaiting    Microsoft.Azure.Documents.Rntbd.Dispatcher.OpenAsync.AnonymousMethod__23_0()    Microsoft.Azure.Documents.Rntbd.Dispatcher.OpenAsync.AnonymousMethod__23_0()
Not Flagged     429 Awaiting    Microsoft.Azure.Documents.Rntbd.Dispatcher.ReceiveLoopAsync()   Microsoft.Azure.Documents.Rntbd.Dispatcher.ReceiveLoopAsync()
Not Flagged     421 Awaiting    System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()   System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()
Not Flagged     420 Awaiting    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)
Not Flagged     198 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)  Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)
Not Flagged     196 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args) Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args)
Not Flagged     193 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)  Microsoft.Azure.Documents.Rntbd.Connection.ReadResponseMetadataAsync(args)
Not Flagged     191 Awaiting    Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args) Microsoft.Azure.Documents.Rntbd.Connection.ReadPayloadAsync(length, type, args)
Not Flagged     177 Awaiting    Microsoft.Azure.Documents.Rntbd.Dispatcher.OpenAsync.AnonymousMethod__23_0()    Microsoft.Azure.Documents.Rntbd.Dispatcher.OpenAsync.AnonymousMethod__23_0()
Not Flagged     175 Awaiting    Microsoft.Azure.Documents.Rntbd.Dispatcher.ReceiveLoopAsync()   Microsoft.Azure.Documents.Rntbd.Dispatcher.ReceiveLoopAsync()
Not Flagged     166 Awaiting    System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()   System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()
Not Flagged     165 Awaiting    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)
Not Flagged     164 Awaiting    System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()   System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()
Not Flagged     163 Awaiting    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)    System.Threading.Tasks.UnwrapPromise<TResult>.Invoke(completingTask)
Not Flagged     802 Scheduled   [Scheduled and waiting to run]  
Not Flagged     773 Scheduled   [Scheduled and waiting to run]  
Not Flagged     707 Scheduled   [Scheduled and waiting to run]  Task.Delay
Not Flagged     647 Scheduled   [Scheduled and waiting to run]  
Not Flagged     523 Scheduled   [Scheduled and waiting to run]  Task.Delay
Not Flagged     480 Scheduled   [Scheduled and waiting to run]  Task.ContinueWith: <QueueAsynchronous>b__0
Not Flagged     453 Scheduled   [Scheduled and waiting to run]  
Not Flagged     449 Scheduled   [Scheduled and waiting to run]  Async: <SendAsync>d__3
Not Flagged     434 Scheduled   [Scheduled and waiting to run]  
Not Flagged     197 Scheduled   [Scheduled and waiting to run]  
Not Flagged     192 Scheduled   [Scheduled and waiting to run]  
Not Flagged     178 Scheduled   [Scheduled and waiting to run]  Async: <<OpenAsync>b__23_0>d
Not Flagged     43  Scheduled   [Scheduled and waiting to run]  Task.ContinueWith: <GetNormalApplicationInstance>b__34_0

threads

Stack trace of the hanging thread:

Contoso.CacheProvider.dll!Contoso.CacheProvider.CosmosDbStatelessProvider.AuthorizeServerRequest
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext)
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout)
mscorlib.dll!System.Threading.ManualResetEventSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
mscorlib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
mscorlib.dll!System.Threading.Tasks.Task.InternalWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken)
mscorlib.dll!System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task task)
mscorlib.dll!System.Runtime.CompilerServices.ConfiguredTaskAwaitable<System.__Canon>.ConfiguredTaskAwaiter.GetResult()
Contoso.CacheProvider.dll!Contoso.CacheProvider.CosmosDbStatelessProvider.AuthorizeServerRequest(System.Guid clientId, System.Guid domainKey, out System.Guid? sessionId, out FM.WebSync.Server.Bindings bindings, out FM.WebSync.Server.Subscriptions subscriptions) Line 285
FM.WebSync.Server.dll!FM.WebSync.Server.StatelessProvider._AuthorizeServerRequest(System.Guid clientId, System.Guid domainKey, out System.Guid? sessionId, out FM.WebSync.Server.Bindings bindings, out FM.WebSync.Server.Subscriptions subscriptions)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.Authorize(FM.WebSync.Message request, System.Guid domainKey, FM.WebSync.Server.EventSource eventSource, out FM.WebSync.Server.ClientReference clientReference)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.Authorize(FM.WebSync.Message[] requests, System.Guid domainKey, FM.WebSync.Server.EventSource eventSource, out FM.WebSync.Server.ClientReference[] clientReferences)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.HandleMultipleRequests(FM.WebSync.Server.EventSource eventSource, FM.WebSync.Server.EventType eventTypeBefore, FM.WebSync.Server.EventType eventTypeAfter, FM.WebSync.Server.State state, bool verifyExists, FM.WebSync.Server.Bindings bindings, ref FM.WebSync.Message[] requests, ref System.Guid domainKey, System.Func<FM.WebSync.Server.WebSyncEventArgs, FM.WebSync.Message> process, out FM.WebSync.Server.WebSyncEventArgs args)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.Bind(FM.WebSync.Server.EventSource eventSource, FM.WebSync.Server.State state, ref FM.WebSync.Message[] requests, ref System.Guid domainKey)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.ServerBind(System.Guid clientId, FM.WebSync.Record[] records, out FM.WebSync.Record[] boundRecords, System.Guid domainKey)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestHandler.BindMany(System.Guid clientId, FM.WebSync.Record[] records, out FM.WebSync.Record[] boundRecords, bool notifyClient, System.Guid? domainKey)
FM.WebSync.Server.dll!FM.WebSync.Server.WebSyncClientReference.Bind(string key, string valueJson, out FM.WebSync.Record[] boundRecords)
FM.WebSync.Server.dll!FM.WebSync.Server.WebSyncClientReference.Bind(string key, string valueJson)
Contoso.Service.dll!Contoso.Service.Handlers.CommonEventHandler.AfterConnect(object sender, FM.WebSync.Server.WebSyncEventArgs e) Line 20
[Lightweight Function]
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.Fire(FM.WebSync.Server.WebSyncEventMethodInfo info, FM.WebSync.Message[] messages, FM.WebSync.Server.WebSyncEventArgs args, bool isRequest, bool fireAndForget)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.FireEvent(FM.WebSync.Server.EventType eventType, FM.WebSync.Server.WebSyncEventArgs args, FM.WebSync.Message[] messages)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.RunAfterEvent(FM.WebSync.Server.EventType eventType, FM.WebSync.Server.WebSyncEventArgs args, FM.WebSync.Message[] responses)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.HandleSingleRequest(FM.WebSync.Server.EventSource eventSource, FM.WebSync.Server.EventType eventTypeBefore, FM.WebSync.Server.EventType eventTypeAfter, FM.WebSync.Server.State state, bool verifyExists, FM.WebSync.Server.Bindings bindings, ref FM.WebSync.Message request, ref System.Guid domainKey, System.Func<FM.WebSync.Server.WebSyncEventArgs, FM.WebSync.Message> process, out FM.WebSync.Server.WebSyncEventArgs args)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.Handshake(FM.WebSync.Server.EventSource eventSource, FM.WebSync.Server.State state, ref FM.WebSync.Message request, ref System.Guid domainKey)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.ClientHandshake(FM.WebSync.Message request, FM.WebSync.Server.State state)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestManager.ProcessState(FM.WebSync.Server.State state, System.Action<FM.WebSync.Message[], FM.WebSync.Message[]> callback)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestHandler.ProcessState(FM.WebSync.Server.State state, System.Action<FM.WebSync.Message[], FM.WebSync.Message[]> callback)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestHandler.BeginProcessHttpRequest(System.Web.HttpContextBase context, System.AsyncCallback cb, object extraData, FM.Server.SessionAccessLevel sessionAccessLevel)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestHandler.BeginProcessRequestBase(System.Web.HttpContextBase context, System.AsyncCallback cb, object extraData, FM.Server.SessionAccessLevel sessionAccessLevel)
FM.WebSync.Server.dll!FM.WebSync.Server.RequestHandler.BeginProcessRequest(System.Web.HttpContext context, System.AsyncCallback cb, object extraData)
System.Web.dll!System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
System.Web.dll!System.Web.HttpApplication.ExecuteStepImpl(System.Web.HttpApplication.IExecutionStep step)
System.Web.dll!System.Web.HttpApplication.ExecuteStep(System.Web.HttpApplication.IExecutionStep step, ref bool completedSynchronously)
System.Web.dll!System.Web.HttpApplication.PipelineStepManager.ResumeSteps(System.Exception error)
System.Web.dll!System.Web.HttpApplication.BeginProcessRequestNotification(System.Web.HttpContext context, System.AsyncCallback cb)
System.Web.dll!System.Web.HttpRuntime.ProcessRequestNotificationPrivate(System.Web.Hosting.IIS7WorkerRequest wr, System.Web.HttpContext context)
System.Web.dll!System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(System.IntPtr rootedObjectsPointer, System.IntPtr nativeRequestContext, System.IntPtr moduleData, int flags)
System.Web.dll!System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(System.IntPtr rootedObjectsPointer, System.IntPtr nativeRequestContext, System.IntPtr moduleData, int flags)
[Native to Managed Transition]
[Managed to Native Transition]
System.Web.dll!System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(System.IntPtr rootedObjectsPointer, System.IntPtr nativeRequestContext, System.IntPtr moduleData, int flags)
System.Web.dll!System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(System.IntPtr rootedObjectsPointer, System.IntPtr nativeRequestContext, System.IntPtr moduleData, int flags)
[AppDomain Transition]
j82w commented 4 years ago

Is it just query or do point operations also get stuck?

lukasz-pyrzyk commented 4 years ago

Yes. It gets stuck on point operation with ReadItemAsync and Stored Procedure execution.

Both cases got stuck on the LoadBalancingPartition.RequestAsync. Parallel stacks below

ReadItemAsync: parallelStack

Stored Procedure: parallelStackSp

I think the issue might be somehow related to the thread context.

My app has background thread which scans the database for idler users. It uses the same base provider, repository etc. Queries done by this background thread complete successfully, even when other web threads are hang.

When thread from System.Web touches DB for second time, it gots stuck.

j82w commented 4 years ago

Are you using the latest 3.4.1 .NET SDK? There was a bug in version before 3.2.0 that was fixed in PR #761 . Troubleshooting this will be a lot easier once you provide the repo.

lukasz-pyrzyk commented 4 years ago

Yes, I use the latest SDK. @j82w, the smallest repro is in https://github.com/lukasz-pyrzyk/CosmosDbSyncOverAsync . You need to setup environment variables to connect to the database, I've added notes to the readme.

ealsur commented 4 years ago

Are you using IHttpHandler in the real solution or was that only for the repro?

lukasz-pyrzyk commented 4 years ago

It's taken from the real app

ealsur commented 4 years ago

Have you consider IHttpAsyncHandler? Since this involves an async operation?

You mention that you were using V2 SDK, could you share how were the queries being executed?

For example, What was the V2 version of :

var dbClient =  _containerFactory;
var partitionKey = new PartitionKey(clientId.ToString());
var query = dbClient
    .GetItemLinqQueryable<ClientEntity>(requestOptions: new QueryRequestOptions { PartitionKey = partitionKey })
    .Where(x => x.ClientId == clientId)
    .ToFeedIterator();

var client = await query.ExecuteFirstOrDefault(nameof(GetClient)).ConfigureAwait(false);
return client;
ealsur commented 4 years ago

I also managed to get the same deadlock using the V2 SDK client on the Repro project, calling DocumentClient.ReadDatabaseAsync from within the HeartbeatRouteHandler locks in the same way.

lukasz-pyrzyk commented 4 years ago

Have you consider IHttpAsyncHandler? Since this involves an async operation?

I did, unfortunately I cannot move. I'm reusing 3rd party handlers and injecting my CosmosDBRepository to them with a strictly coupled synchronous interface. That's why I did ugly sync-over-async.

You mention that you were using V2 SDK, could you share how were the queries being executed?

Sure. I've added new branch called v2 to the repository, https://github.com/lukasz-pyrzyk/CosmosDbSyncOverAsync/tree/v2

I also managed to get the same deadlock using the V2 SDK client on the Repro project, calling DocumentClient.ReadDatabaseAsync from within the HeartbeatRouteHandler locks in the same way.

Can you show me the code? In the branch, I've committed a code which works correctly for multiple requests, at least on my machine :)

ealsur commented 4 years ago

To rule out anything about Queries, I added a method on the Repository:

public async Task<bool> CheckIfDatabaseExistsV2()
{
    try
    {
        await _oldClient.ReadDatabaseAsync(UriFactory.CreateDatabaseUri("Test"));
        return true;
    }
    catch (DocumentClientException ex) when (ex.StatusCode == HttpStatusCode.NotFound)
    {
        return false;
    }
}

The _oldClient is created on the constructor:

_oldClient = new DocumentClient(new Uri(_settings.Endpoint), _settings.Key, new ConnectionPolicy()
{
    ConnectionMode = _settings.UseDirectConnectionMode ? Microsoft.Azure.Documents.Client.ConnectionMode.Direct : Microsoft.Azure.Documents.Client.ConnectionMode.Gateway
});

Then from inside the RouteHandler, I call:

repository.CheckIfDatabaseExistsV2().ConfigureAwait(false).GetAwaiter().GetResult();

And it locks.

I switched to Gateway mode to rule out any Direct mode issues.

ealsur commented 4 years ago

It looks like the issue is, when running async over sync, the Synchronization Context.

If I put the V2 ReadDatabaseAsync call inside the Repository class and call it with GetAwaiter from the Handler, it locks (the Repository class has an await). But, if I call the V2 ReadDatabaseAsync call from the Handler, with GetAwaiter, it works. (leaving the Repository class out of the picture).

One easy workaround for this is to wrap any call you make around Task.Run, for example:

ClientEntity clientEntity;
Task.Run(() =>
    {
        clientEntity = repository.GetClient(clientId).ConfigureAwait(false).GetAwaiter().GetResult();
    }
).Wait();
lukasz-pyrzyk commented 4 years ago

Thanks @ealsur for digging in! The issue is definitely related to the Synchronization Context. I tried to reproduce the issue with xunit tests and it doesn't occur, only with ASP.NET which has a synchronization context.

According to your sample, I would like to point out one thing. You have a hang on V2 SDK because you don't have ConfigureAwait(false) on the async call inside repository. When you add it, the hang does not occur.   I've updated my samples to don't have any query-related stuff.

SDK V3, hangs:

        repository.ReadDatabase().ConfigureAwait(false).GetAwaiter().GetResult();
        [...]

        internal async Task ReadDatabase()
        {
            try
            {
                var client = await clientFactory;
                await client.ReadAsync().ConfigureAwait(false);
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
                throw;
            }
        }

SDK V2, doesn't hang

        repository.ReadDatabase().ConfigureAwait(false).GetAwaiter().GetResult();
        [..]

        internal async Task ReadDatabase()
        {
            try
            {
                var client = await _clientFactory;
                await client.ReadDatabaseAsync(UriFactory.CreateDatabaseUri(_settings.DatabaseId)).ConfigureAwait(false);
            }
            catch (Exception e)
            {
                Trace.WriteLine(e);
                throw;
            }
        }```
ealsur commented 4 years ago

Thanks for the catch! If you wrap your call in the RouteHandler in Task.Run, does it work correctly in your real project?

lukasz-pyrzyk commented 4 years ago

Well, in real app I have some kind of dynamic injecting and abstraction, but I quickly checked - With Task.Run it doesn't hang

ealsur commented 4 years ago

The linked PR was tested on the repro Github and it fixes the reported issue.

lukasz-pyrzyk commented 4 years ago

@ealsur thank you very much! If you have some kind of night builds, I will be happy to test it in the real app.