Closed dkroehan closed 4 years ago
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @kushagarathapar @srinathnarayanan @southpolesteve @shurd
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @kushagarathapar @srinathnarayanan @southpolesteve @shurd
@dkroehan is the above error with the emulator or production cosmos account?
@kirankumarkolli The error is with a production cosmos account.
12:16:52.459 [reactor-http-epoll-8] ERROR .....DatabaseInitializer - Error when creating stored procedures
com.azure.data.cosmos.CosmosClientException: Message: {"Errors":["Resource Not Found"]}
ActivityId: 4a445bec-9e42-493c-8ebe-26b6cc646c62, Request URI: /apps/3d31f75e-20f5-4083-946c-a7410e147abb/services/edff4562-9097-4a75-8a25-5c77b6e88a57/partitions/42e35da6-ebce-41b0-8e41-03dfbb3ad2da/replicas/132174287216593763s, RequestStats:
RequestStartTime: 2019-11-05T12:16:52.4270506Z, RequestEndTime: 2019-11-05T12:16:52.4270506Z, Number of regions attempted:1
ResponseTime: 2019-11-05T12:16:52.4270506Z, StoreResult: StorePhysicalAddress: rntbd://10.0.0.21:11000/apps/3d31f75e-20f5-4083-946c-a7410e147abb/services/edff4562-9097-4a75-8a25-5c77b6e88a57/partitions/42e35da6-ebce-41b0-8e41-03dfbb3ad2da/replicas/132174287216593763s, LSN: 16068, GlobalCommittedLsn: 16068, PartitionKeyRangeId: , IsValid: True, StatusCode: 404, SubStatusCode: 0, RequestCharge: 1, ItemLSN: -1, SessionToken: -1#16068, UsingLocalLSN: False, TransportException: null, ResourceType: Collection, OperationType: Read
ResponseTime: 2019-11-05T12:16:52.4270506Z, StoreResult: StorePhysicalAddress: rntbd://10.0.0.32:11000/apps/3d31f75e-20f5-4083-946c-a7410e147abb/services/edff4562-9097-4a75-8a25-5c77b6e88a57/partitions/42e35da6-ebce-41b0-8e41-03dfbb3ad2da/replicas/132173994761905544s, LSN: 16068, GlobalCommittedLsn: 16068, PartitionKeyRangeId: , IsValid: True, StatusCode: 404, SubStatusCode: 0, RequestCharge: 1, ItemLSN: -1, SessionToken: -1#16068, UsingLocalLSN: False, TransportException: null, ResourceType: Collection, OperationType: Read
, SDK: Microsoft.Azure.Documents.Common/2.7.0, StatusCode: NotFound
at com.azure.data.cosmos.BridgeInternal.createCosmosClientException(BridgeInternal.java:350)
at com.azure.data.cosmos.internal.RxGatewayStoreModel.validateOrThrow(RxGatewayStoreModel.java:388)
at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$null$5(RxGatewayStoreModel.java:326)
It mostly means that the resource hierarchy is not present. I.e. either database/container are not present. One possibility is concurrent CRUD on database/container. Is that a possibility in your workload?
Second failure also seems similar but it also exposed an assertion. @kushagraThapar can you please create a follow-up work-item to track missing OnBefore call?
12:16:52.459 [reactor-http-epoll-8] ERROR .....DatabaseInitializer - Error when creating stored procedures com.azure.data.cosmos.CosmosClientException: Message: {"Errors":["Resource Not Found"]} ActivityId: 4a445bec-9e42-493c-8ebe-26b6cc646c62, Request URI: /apps/3d31f75e-20f5-4083-946c-a7410e147abb/services/edff4562-9097-4a75-8a25-5c77b6e88a57/partitions/42e35da6-ebce-41b0-8e41-03dfbb3ad2da/replicas/132174287216593763s, RequestStats: RequestStartTime: 2019-11-05T12:16:52.4270506Z, RequestEndTime: 2019-11-05T12:16:52.4270506Z, Number of regions attempted:1 ResponseTime: 2019-11-05T12:16:52.4270506Z, StoreResult: StorePhysicalAddress: rntbd://10.0.0.21:11000/apps/3d31f75e-20f5-4083-946c-a7410e147abb/services/edff4562-9097-4a75-8a25-5c77b6e88a57/partitions/42e35da6-ebce-41b0-8e41-03dfbb3ad2da/replicas/132174287216593763s, LSN: 16068, GlobalCommittedLsn: 16068, PartitionKeyRangeId: , IsValid: True, StatusCode: 404, SubStatusCode: 0, RequestCharge: 1, ItemLSN: -1, SessionToken: -1#16068, UsingLocalLSN: False, TransportException: null, ResourceType: Collection, OperationType: Read ResponseTime: 2019-11-05T12:16:52.4270506Z, StoreResult: StorePhysicalAddress: rntbd://10.0.0.32:11000/apps/3d31f75e-20f5-4083-946c-a7410e147abb/services/edff4562-9097-4a75-8a25-5c77b6e88a57/partitions/42e35da6-ebce-41b0-8e41-03dfbb3ad2da/replicas/132173994761905544s, LSN: 16068, GlobalCommittedLsn: 16068, PartitionKeyRangeId: , IsValid: True, StatusCode: 404, SubStatusCode: 0, RequestCharge: 1, ItemLSN: -1, SessionToken: -1#16068, UsingLocalLSN: False, TransportException: null, ResourceType: Collection, OperationType: Read , SDK: Microsoft.Azure.Documents.Common/2.7.0, StatusCode: NotFound at com.azure.data.cosmos.BridgeInternal.createCosmosClientException(BridgeInternal.java:350) at com.azure.data.cosmos.internal.RxGatewayStoreModel.validateOrThrow(RxGatewayStoreModel.java:388) at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$null$5(RxGatewayStoreModel.java:326)
It mostly means that the resource hierarchy is not present. I.e. either database/container are not present. One possibility is concurrent CRUD on database/container. Is that a possibility in your workload?
Second failure also seems similar but it also exposed an assertion. @kushagraThapar can you please create a follow-up work-item to track missing OnBefore call?
Sure, will do that.
It mostly means that the resource hierarchy is not present. I.e. either database/container are not present. One possibility is concurrent CRUD on database/container. Is that a possibility in your workload?
@dkroehan can you please validate above?
We checked our DatabaseInitializer class and yes there is some concurrent CRUD when creating containers after the database was setup successfully. Here is the method we use: (I just renamed some variables.)
private void ensureDatabase() {
String databaseId = getDatabaseId();
String container1Id = getContainer1Id();
String container2Id = getContainer2Id();
CosmosDatabaseResponse cosmosDatabaseResponse = client.createDatabaseIfNotExists(databaseId)
.doOnSuccess(ignored -> LOGGER.debug("Database: '{}' available", databaseId))
.doOnError(throwable -> LOGGER.error("Error when creating/using existing database", throwable))
.block();
CosmosDatabase database = cosmosDatabaseResponse.database();
database.createContainerIfNotExists(createContainer1Properties(container1Id))
.doOnSuccess(cosmosContainerResponse -> LOGGER.debug("Container: '{}' available", container1Id))
.doOnError(throwable -> LOGGER.error("Error when creating/using existing container", throwable))
.flatMapMany(response -> createStoredProcedures(response, storedProcedures))
.doOnError(throwable -> LOGGER.error("Error when creating/using existing stored procedures", throwable))
.publishOn(Schedulers.elastic())
.blockLast();
database.createContainerIfNotExists(createContainer2Properties(container2Id))
.doOnSuccess(cosmosContainerResponse -> LOGGER.debug("Container: '{}' available", container2Id))
.doOnError(throwable -> LOGGER.error("Error when creating/using existing container", throwable))
.publishOn(Schedulers.elastic())
.block();
}
We removed the publishOn(Schedulers.elastic())
and right now it seems to solve that one.
Good to know that one should not create containers in parallel.
So there is still the "Resource Not Found" error regarding the ClientRetryPolicy, that I posted in the first post.
And there is one more I got from our logs. It is from an integration test that tests a bulk-delete stored-procedure.
reactor.core.Exceptions$ReactiveException:
CosmosClientException{error={"code":"NotFound","message":"Message: {\"Errors\":[\"Resource Not Found\"]}\r\nActivityId: ba81b6df-e52c-4360-9db5-016b8bd4bb35, Request URI: /apps/1f769f8c-2da8-4eb4-987b-3bbdbd761a1f/services/149cb50c-514c-4c09-b82f-e8ed63c3dfbd/partitions/0931392a-b005-4c66-b75d-2c1f8987ef65/replicas/132182001009793585p/, RequestStats: \r\nRequestStartTime: 2019-11-14T11:34:06.1578194Z, RequestEndTime: 2019-11-14T11:34:06.1578194Z, Number of regions attempted:1\r\nResponseTime: 2019-11-14T11:34:06.1578194Z, StoreResult: StorePhysicalAddress: rntbd://cdb-ms-prod-westeurope1-fd18.documents.azure.com:14025/apps/1f769f8c-2da8-4eb4-987b-3bbdbd761a1f/services/149cb50c-514c-4c09-b82f-e8ed63c3dfbd/partitions/0931392a-b005-4c66-b75d-2c1f8987ef65/replicas/132182001009793585p/, LSN: 64, GlobalCommittedLsn: 64, PartitionKeyRangeId: 0, IsValid: True, StatusCode: 404, SubStatusCode: 0, RequestCharge: 0.62, ItemLSN: -1, SessionToken: -1#64, UsingLocalLSN: False, TransportException: null, ResourceType: StoredProcedure, OperationType: ExecuteJavaScript\r\n, SDK: Microsoft.Azure.Documents.Common/2.7.0, StatusCode: NotFound","additionalErrorInfo":null}, resourceAddress='null', statusCode=404, message=Message: {"Errors":["Resource Not Found"]}
ActivityId: ba81b6df-e52c-4360-9db5-016b8bd4bb35, Request URI: /apps/1f769f8c-2da8-4eb4-987b-3bbdbd761a1f/services/149cb50c-514c-4c09-b82f-e8ed63c3dfbd/partitions/0931392a-b005-4c66-b75d-2c1f8987ef65/replicas/132182001009793585p/, RequestStats:
RequestStartTime: 2019-11-14T11:34:06.1578194Z, RequestEndTime: 2019-11-14T11:34:06.1578194Z, Number of regions attempted:1
ResponseTime: 2019-11-14T11:34:06.1578194Z, StoreResult: StorePhysicalAddress: rntbd://cdb-ms-prod-westeurope1-fd18.documents.azure.com:14025/apps/1f769f8c-2da8-4eb4-987b-3bbdbd761a1f/services/149cb50c-514c-4c09-b82f-e8ed63c3dfbd/partitions/0931392a-b005-4c66-b75d-2c1f8987ef65/replicas/132182001009793585p/, LSN: 64, GlobalCommittedLsn: 64, PartitionKeyRangeId: 0, IsValid: True, StatusCode: 404, SubStatusCode: 0, RequestCharge: 0.62, ItemLSN: -1, SessionToken: -1#64, UsingLocalLSN: False, TransportException: null, ResourceType: StoredProcedure, OperationType: ExecuteJavaScript
, SDK: Microsoft.Azure.Documents.Common/2.7.0, StatusCode: NotFound, causeInfo=null, responseHeaders={Transfer-Encoding=chunked, x-ms-last-state-change-utc=Thu, 14 Nov 2019 10:15:05.801 GMT, x-ms-current-replica-set-size=4, Server=Microsoft-HTTPAPI/2.0, x-ms-session-token=0:-1#64, lsn=64, x-ms-request-charge=0.62, x-ms-schemaversion=1.8, x-ms-transport-request-id=5, x-ms-number-of-read-regions=0, x-ms-current-write-quorum=3, x-ms-cosmos-quorum-acked-llsn=64, x-ms-quorum-acked-lsn=64, Date=Thu, 14 Nov 2019 11:34:05 GMT, x-ms-activity-id=ba81b6df-e52c-4360-9db5-016b8bd4bb35, Strict-Transport-Security=max-age=31536000, x-ms-xp-role=1, x-ms-global-Committed-lsn=64, x-ms-cosmos-llsn=64, x-ms-gatewayversion=version=2.7.0, x-ms-serviceversion=version=2.7.0.0, Content-Type=application/json}, requestHeaders={authorization=type%3Dmaster%26ver%3D1.0%26sig%3DvTBHvnnHwNHN6QSQ1%2BSocdOcg4A2fxnm%2BMRGPaV90Qw%3D, Accept=application/json, x-ms-date=Thu, 14 Nov 2019 11:34:06 GMT, x-ms-documentdb-partitionkey=["286a1129__286a1129"], x-ms-consistency-level=Session, Content-Type=application/json}}
at ------ removed --------
Caused by: com.azure.data.cosmos.CosmosClientException:
Message: {"Errors":["Resource Not Found"]}
ActivityId: ba81b6df-e52c-4360-9db5-016b8bd4bb35, Request URI: /apps/1f769f8c-2da8-4eb4-987b-3bbdbd761a1f/services/149cb50c-514c-4c09-b82f-e8ed63c3dfbd/partitions/0931392a-b005-4c66-b75d-2c1f8987ef65/replicas/132182001009793585p/, RequestStats:
RequestStartTime: 2019-11-14T11:34:06.1578194Z, RequestEndTime: 2019-11-14T11:34:06.1578194Z, Number of regions attempted:1
ResponseTime: 2019-11-14T11:34:06.1578194Z, StoreResult: StorePhysicalAddress: rntbd://cdb-ms-prod-westeurope1-fd18.documents.azure.com:14025/apps/1f769f8c-2da8-4eb4-987b-3bbdbd761a1f/services/149cb50c-514c-4c09-b82f-e8ed63c3dfbd/partitions/0931392a-b005-4c66-b75d-2c1f8987ef65/replicas/132182001009793585p/, LSN: 64, GlobalCommittedLsn: 64, PartitionKeyRangeId: 0, IsValid: True, StatusCode: 404, SubStatusCode: 0, RequestCharge: 0.62, ItemLSN: -1, SessionToken: -1#64, UsingLocalLSN: False, TransportException: null, ResourceType: StoredProcedure, OperationType: ExecuteJavaScript
, SDK: Microsoft.Azure.Documents.Common/2.7.0, StatusCode: NotFound
We checked our DatabaseInitializer class and yes there is some concurrent CRUD when creating containers after the database was setup successfully. Here is the method we use: (I just renamed some variables.)
private void ensureDatabase() { String databaseId = getDatabaseId(); String container1Id = getContainer1Id(); String container2Id = getContainer2Id(); CosmosDatabaseResponse cosmosDatabaseResponse = client.createDatabaseIfNotExists(databaseId) .doOnSuccess(ignored -> LOGGER.debug("Database: '{}' available", databaseId)) .doOnError(throwable -> LOGGER.error("Error when creating/using existing database", throwable)) .block(); CosmosDatabase database = cosmosDatabaseResponse.database(); database.createContainerIfNotExists(createContainer1Properties(container1Id)) .doOnSuccess(cosmosContainerResponse -> LOGGER.debug("Container: '{}' available", container1Id)) .doOnError(throwable -> LOGGER.error("Error when creating/using existing container", throwable)) .flatMapMany(response -> createStoredProcedures(response, storedProcedures)) .doOnError(throwable -> LOGGER.error("Error when creating/using existing stored procedures", throwable)) .publishOn(Schedulers.elastic()) .blockLast(); database.createContainerIfNotExists(createContainer2Properties(container2Id)) .doOnSuccess(cosmosContainerResponse -> LOGGER.debug("Container: '{}' available", container2Id)) .doOnError(throwable -> LOGGER.error("Error when creating/using existing container", throwable)) .publishOn(Schedulers.elastic()) .block(); }
We removed the
publishOn(Schedulers.elastic())
and right now it seems to solve that one.Good to know that one should not create containers in parallel.
Unfortunately we still see "Resource Not Found" errors when setting up CosmosDB. Errors are less, but they still occur.
Here is the stacktrace:
20:58:22.873 [reactor-http-epoll-1] ERROR DatabaseInitializer - Error when creating/using existing stored procedures
com.azure.data.cosmos.CosmosClientException: Message: {"Errors":["Resource Not Found"]}
ActivityId: 44fd67d3-0f81-4bfa-8810-9ca22f494175, Request URI: /apps/3d31f75e-20f5-4083-946c-a7410e147abb/services/edff4562-9097-4a75-8a25-5c77b6e88a57/partitions/42e35da6-ebce-41b0-8e41-03dfbb3ad2da/replicas/132187246529389581s, RequestStats:
RequestStartTime: 2019-11-20T20:58:22.8454700Z, RequestEndTime: 2019-11-20T20:58:22.8555060Z, Number of regions attempted:1
ResponseTime: 2019-11-20T20:58:22.8555060Z, StoreResult: StorePhysicalAddress: rntbd://10.0.0.36:11000/apps/3d31f75e-20f5-4083-946c-a7410e147abb/services/edff4562-9097-4a75-8a25-5c77b6e88a57/partitions/42e35da6-ebce-41b0-8e41-03dfbb3ad2da/replicas/132187246529389581s, LSN: 19421, GlobalCommittedLsn: 19421, PartitionKeyRangeId: , IsValid: True, StatusCode: 404, SubStatusCode: 0, RequestCharge: 1, ItemLSN: -1, SessionToken: -1#19421, UsingLocalLSN: False, TransportException: null, ResourceType: Collection, OperationType: Read
ResponseTime: 2019-11-20T20:58:22.8555060Z, StoreResult: StorePhysicalAddress: rntbd://10.0.0.30:11000/apps/3d31f75e-20f5-4083-946c-a7410e147abb/services/edff4562-9097-4a75-8a25-5c77b6e88a57/partitions/42e35da6-ebce-41b0-8e41-03dfbb3ad2da/replicas/132187489217457376s, LSN: 19421, GlobalCommittedLsn: 19421, PartitionKeyRangeId: , IsValid: True, StatusCode: 404, SubStatusCode: 0, RequestCharge: 1, ItemLSN: -1, SessionToken: -1#19421, UsingLocalLSN: False, TransportException: null, ResourceType: Collection, OperationType: Read
, SDK: Microsoft.Azure.Documents.Common/2.7.0, StatusCode: NotFound
at com.azure.data.cosmos.BridgeInternal.createCosmosClientException(BridgeInternal.java:350)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.FluxFlatMap] :
reactor.core.publisher.Flux.flatMap
com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$toDocumentServiceResponse$1(RxGatewayStoreModel.java:266)
Error has been observed at the following site(s):
|_ Flux.flatMap ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$toDocumentServiceResponse$1(RxGatewayStoreModel.java:266)
|_ Flux.single ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$toDocumentServiceResponse$1(RxGatewayStoreModel.java:281)
|_ Mono.flatMap ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.toDocumentServiceResponse(RxGatewayStoreModel.java:246)
|_ Mono.map ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.toDocumentServiceResponse(RxGatewayStoreModel.java:283)
|_ Mono.error ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$toDocumentServiceResponse$2(RxGatewayStoreModel.java:301)
|_ Mono.onErrorResume ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.toDocumentServiceResponse(RxGatewayStoreModel.java:284)
|_ Mono.flux ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.toDocumentServiceResponse(RxGatewayStoreModel.java:302)
|_ ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.doCreate(RxGatewayStoreModel.java:86)
|_ ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.invokeAsyncInternal(RxGatewayStoreModel.java:342)
|_ Flux.single ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$invokeAsync$3(RxGatewayStoreModel.java:365)
|_ ⇢ at com.azure.data.cosmos.internal.BackoffRetryUtility.lambda$executeRetry$0(BackoffRetryUtility.java:34)
|_ Mono.defer ⇢ at com.azure.data.cosmos.internal.BackoffRetryUtility.executeRetry(BackoffRetryUtility.java:31)
|_ Flux.flatMap ⇢ at com.azure.data.cosmos.internal.RetryUtils.lambda$toRetryWhenFunc$2(RetryUtils.java:22)
|_ Mono.retryWhen ⇢ at com.azure.data.cosmos.internal.BackoffRetryUtility.executeRetry(BackoffRetryUtility.java:38)
|_ Mono.flux ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.invokeAsync(RxGatewayStoreModel.java:366)
|_ Flux.error ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$processMessage$4(RxGatewayStoreModel.java:394)
|_ Flux.onErrorResume ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.processMessage(RxGatewayStoreModel.java:375)
|_ Flux.map ⇢ at com.azure.data.cosmos.internal.RxGatewayStoreModel.processMessage(RxGatewayStoreModel.java:396)
|_ ⇢ at com.azure.data.cosmos.internal.RxDocumentClientImpl.create(RxDocumentClientImpl.java:1043)
|_ Flux.map ⇢ at com.azure.data.cosmos.internal.RxDocumentClientImpl.createStoredProcedureInternal(RxDocumentClientImpl.java:1461)
|_ ⇢ at com.azure.data.cosmos.internal.RxDocumentClientImpl.lambda$createStoredProcedure$38(RxDocumentClientImpl.java:1442)
|_ Flux.single ⇢ at com.azure.data.cosmos.internal.ObservableHelper.lambda$inlineIfPossibleAsObs$1(ObservableHelper.java:44)
|_ ⇢ at com.azure.data.cosmos.internal.BackoffRetryUtility.lambda$executeRetry$0(BackoffRetryUtility.java:34)
|_ Mono.defer ⇢ at com.azure.data.cosmos.internal.BackoffRetryUtility.executeRetry(BackoffRetryUtility.java:31)
|_ Flux.flatMap ⇢ at com.azure.data.cosmos.internal.RetryUtils.lambda$toRetryWhenFunc$1(RetryUtils.java:28)
|_ Flux.flatMap ⇢ at com.azure.data.cosmos.internal.RetryUtils.lambda$toRetryWhenFunc$2(RetryUtils.java:22)
|_ Mono.retryWhen ⇢ at com.azure.data.cosmos.internal.BackoffRetryUtility.executeRetry(BackoffRetryUtility.java:38)
|_ Mono.flux ⇢ at com.azure.data.cosmos.internal.ObservableHelper.inlineIfPossibleAsObs(ObservableHelper.java:44)
|_ ⇢ at com.azure.data.cosmos.internal.RxDocumentClientImpl.createStoredProcedure(RxDocumentClientImpl.java:1442)
|_ Flux.map ⇢ at com.azure.data.cosmos.CosmosScripts.createStoredProcedure(CosmosScripts.java:58)
|_ Flux.single ⇢ at com.azure.data.cosmos.CosmosScripts.createStoredProcedure(CosmosScripts.java:59)
Stack trace:
at com.azure.data.cosmos.BridgeInternal.createCosmosClientException(BridgeInternal.java:350)
at com.azure.data.cosmos.internal.RxGatewayStoreModel.validateOrThrow(RxGatewayStoreModel.java:333)
at com.azure.data.cosmos.internal.RxGatewayStoreModel.lambda$toDocumentServiceResponse$0(RxGatewayStoreModel.java:270)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:378)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67)
at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onNext(FluxDoFinally.java:123)
at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:178)
at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onNext(FluxMapFuseable.java:287)
at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onNext(FluxDoFinally.java:123)
at reactor.core.publisher.FluxFilterFuseable$FilterFuseableConditionalSubscriber.onNext(FluxFilterFuseable.java:330)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1592)
at reactor.core.publisher.MonoReduceSeed$ReduceSeedSubscriber.onComplete(MonoReduceSeed.java:156)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onComplete(FluxMapFuseable.java:144)
at reactor.netty.channel.FluxReceive.terminateReceiver(FluxReceive.java:397)
at reactor.netty.channel.FluxReceive.drainReceiver(FluxReceive.java:197)
at reactor.netty.channel.FluxReceive.onInboundComplete(FluxReceive.java:345)
at reactor.netty.channel.ChannelOperations.onInboundComplete(ChannelOperations.java:363)
at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:412)
at reactor.netty.http.client.HttpClientOperations.onInboundNext(HttpClientOperations.java:556)
at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:91)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:328)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:302)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1224)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1271)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:505)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:444)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:283)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:483)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:383)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
@dkroehan just to be on same-page ensureDatabase() is called before each API call, hence its concurrent CRUD?
Collection/Database operations are expensive, is it possible to limit them to only initialization path?
@kirankumarkolli We continued investigating the issue on our side and it seems that we found the issue.
It was a combination of multiple issues in the way the integration tests were executed. I also found out that it is not a good idea to drop a collection and create a collection with the same name immediately afterwards.
I will close that issue. Thx for your helpful input!
Describe the bug We have integration tests that run tests against a real CosmosDB. It basically tests that the queries are working as expected. In some runs we are getting "Resource Not Found" errors. This occurs both in our CI/CD pipeline and locally.
We have a DatabaseInitializer class, which sets up the Cosmos database, creates containers and create some stored procedures during the application startup.
In some cases the "Resource Not Found" error already occurs during the setup, but most of the time it occurs when running the queries or adding/cleaning up test data.
Exception or Stack Trace This is the stack trace from the database setup:
This is a stack trace when the database is setup successfully and the test is adding some test data before executing a query:
To Reproduce This is tricky, since it occurs kind of randomly.
Setup (please complete the following information): IDE : IntelliJ Java SDK: v2.4.4 and SDK v3.2.2
Additional context If this is not a bug, it would be really great to understand what is wrong with the setup.