dotnet / aspire

An opinionated, cloud ready stack for building observable, production ready, distributed applications in .NET
https://learn.microsoft.com/dotnet/aspire
MIT License
3.63k stars 408 forks source link

Flaky test - `QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages` #5140

Open radical opened 1 month ago

radical commented 1 month ago

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=763013 Build error leg or test failing: Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: True) Pull request: https://github.com/dotnet/aspire/pull/5099

Error message
Grpc.Core.RpcException : Status(StatusCode="Cancelled", Detail="Call canceled by the client.", DebugException="System.OperationCanceledException: The operation was canceled.")
---- System.OperationCanceledException : The operation was canceled.

Stack trace
   at Qdrant.Client.QdrantClient.SearchAsync(String collectionName, ReadOnlyMemory`1 vector, Filter filter, SearchParams searchParams, UInt64 limit, UInt64 offset, WithPayloadSelector payloadSelector, WithVectorsSelector vectorsSelector, Nullable`1 scoreThreshold, String vectorName, ReadConsistency readConsistency, ShardKeySelector shardKeySelector, Nullable`1 sparseIndices, Nullable`1 timeout, CancellationToken cancellationToken)
   at Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.<>c__DisplayClass6_1.<<WithDataShouldPersistStateBetweenUsages>b__1>d.MoveNext() in /_/tests/Aspire.Hosting.Qdrant.Tests/QdrantFunctionalTests.cs:line 187
--- End of stack trace from previous location ---
   at Polly.ResiliencePipeline.<>c.<<ExecuteAsync>b__3_0>d.MoveNext()
--- End of stack trace from previous location ---
   at Polly.Outcome`1.GetResultOrRethrow()
   at Polly.ResiliencePipeline.ExecuteAsync(Func`2 callback, CancellationToken cancellationToken)
   at Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(Boolean useVolume) in /_/tests/Aspire.Hosting.Qdrant.Tests/QdrantFunctionalTests.cs:line 183
   at Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(Boolean useVolume) in /_/tests/Aspire.Hosting.Qdrant.Tests/QdrantFunctionalTests.cs:line 196
--- End of stack trace from previous location ---
----- Inner Stack Trace -----

https://github.com/dotnet/aspire/blob/232e434c79acd2bd1221ad05a5d2dd5118018804/tests/Aspire.Hosting.Qdrant.Tests/QdrantFunctionalTests.cs#L93-L95 https://github.com/dotnet/aspire/blob/232e434c79acd2bd1221ad05a5d2dd5118018804/tests/Aspire.Hosting.Qdrant.Tests/QdrantFunctionalTests.cs#L183-L190

This code is being executed in a Resilience pipeline watching for RpcException.

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "RpcException.*Call canceled by the client",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

cc @eerhardt @sebastienros

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=763013 Error message validated: [RpcException.*Call canceled by the client] Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 8/1/2024 6:49:28 PM UTC

Report

Build Definition Test Pull Request
807341 dotnet/aspire Aspire.Hosting.Milvus.Tests.MilvusFunctionalTests.Aspire.Hosting.Milvus.Tests.MilvusFunctionalTests.WithDataShouldPersistStateBetweenUsages dotnet/aspire#5701
807198 dotnet/aspire Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages dotnet/aspire#5511
806452 dotnet/aspire Aspire.Hosting.Milvus.Tests.WorkItemExecution dotnet/aspire#5538
787176 dotnet/aspire Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages dotnet/aspire#5405
776607 dotnet/aspire Aspire.Playground.Tests.AppHostTests.Aspire.Playground.Tests.AppHostTests.TestEndpointsReturnOk dotnet/aspire#5236

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
2 3 5
eerhardt commented 1 month ago

Looking at the logs I see:

fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not determine host address and port for container port  {"Container": {"name":"qdrant-mkbsaegq-59a393fe"}, "Reconciliation": 5, "error": "container '/qdrant-mkbsaegq-59a393fe' is not running: exited"}
fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not create Endpoint object  {"Container": {"name":"qdrant-mkbsaegq-59a393fe"}, "Reconciliation": 5, "ServiceName": "qdrant-grpc-59a393fe", "Workload": "/qdrant-mkbsaegq-59a393fe", "error": "container '/qdrant-mkbsaegq-59a393fe' is not running: exited"}
fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not determine host address and port for container port  {"Container": {"name":"qdrant-mkbsaegq-59a393fe"}, "Reconciliation": 5, "error": "container '/qdrant-mkbsaegq-59a393fe' is not running: exited"}
fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not create Endpoint object  {"Container": {"name":"qdrant-mkbsaegq-59a393fe"}, "Reconciliation": 5, "ServiceName": "qdrant-http-59a393fe", "Workload": "/qdrant-mkbsaegq-59a393fe", "error": "container '/qdrant-

Did the container fail to start? Is there a way to get the containers logs?

radical commented 1 month ago

.. and the first one failed to shutdown/deletion(?):

k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'Conflict', response body {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on containers.usvc-dev.developer.microsoft.com \"qdrant-azvxadzz-73d029b3\": the object has been modified; please apply your changes to the latest version and try again","reason":"Conflict","details":{"name":"qdrant-azvxadzz-73d029b3","group":"usvc-dev.developer.microsoft.com","kind":"containers"},"code":409}
   at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.ICustomObjectsOperations_DeleteClusterCustomObjectWithHttpMessagesAsync[T](String group, String version, String plural, String name, V1DeleteOptions body, Nullable`1 gracePeriodSeconds, Nullable`1 orphanDependents, String propagationPolicy, String dryRun, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.k8s.ICustomObjectsOperations.DeleteClusterCustomObjectWithHttpMessagesAsync(String group, String version, String plural, String name, V1DeleteOptions body, Nullable`1 gracePeriodSeconds, Nullable`1 orphanDependents, String propagationPolicy, String dryRun, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at Aspire.Hosting.Dcp.KubernetesService.<>c__DisplayClass18_0`1.<<DeleteAsync>b__0>d.MoveNext() in /_/src/Aspire.Hosting/Dcp/KubernetesService.cs:line 165
--- End of stack trace from previous location ---
   at Aspire.Hosting.Dcp.KubernetesService.ExecuteWithRetry[TResult](DcpApiOperationType operationType, String resourceType, Func`2 operation, CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/KubernetesService.cs:line 308
   at Aspire.Hosting.Dcp.ApplicationExecutor.DeleteResourcesAsync[RT](String resourceType, CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/ApplicationExecutor.cs:line 1803

re:container-logs, we'll need to add something to explicitly get the logs, or pipe the logs to the logger.

Alirexaa commented 1 month ago

I use ResourceLoggerForwarderService to logger log container logs :))

  private TestDistributedApplicationBuilder CreateDistributedApplicationBuilder()
    {
        var builder = TestDistributedApplicationBuilder.CreateWithTestContainerRegistry();
        builder.Services.AddXunitLogging(testOutputHelper);
        builder.Services.AddHostedService<ResourceLoggerForwarderService>();

        return builder;
    }