Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.33k stars 1.98k forks source link

[QUERY] Reason For Error Message In Cosmos DB Spark Connector #37516

Open 544cl opened 11 months ago

544cl commented 11 months ago

Query/Question Hello, i am getting some exceptions while patching cosmos db documents using the spark cosmos db connector. Patch operation completed successfully but took a long time because many tasks failed. Can you guys tell me what is this related to? Thanks.

Getting error on this line

The main Exception is.

java.lang.AssertionError: assumption failed
    at scala.Predef$.assume(Predef.scala:239)
    at com.azure.cosmos.spark.BulkWriter.flushAndClose(BulkWriter.scala:538)
    at com.azure.cosmos.spark.ItemsDataWriteFactory$CosmosWriter.commit(ItemsDataWriteFactory.scala:130)
    at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:439)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
    at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:466)
    at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:367)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

image

Some other Errors i see in the logs maybe they are related.

azure_cosmos_spark.reactor.core.Exceptions$ErrorCallbackNotImplemented: java.lang.ClassCastException: azure_cosmos_spark.com.azure.cosmos.models.CosmosPatchOperations cannot be cast to azure_cosmos_spark.com.fasterxml.jackson.databind.node.ObjectNode
Caused by: java.lang.ClassCastException: azure_cosmos_spark.com.azure.cosmos.models.CosmosPatchOperations cannot be cast to azure_cosmos_spark.com.fasterxml.jackson.databind.node.ObjectNode
    at com.azure.cosmos.spark.BulkWriter.$anonfun$handleNonSuccessfulStatusCode$4(BulkWriter.scala:358)
    at azure_cosmos_spark.reactor.core.scala.publisher.SMono$.$anonfun$defer$1(SMono.scala:1491)
    at azure_cosmos_spark.reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:44)
    at azure_cosmos_spark.reactor.core.publisher.MonoDelaySubscription.accept(MonoDelaySubscription.java:53)
    at azure_cosmos_spark.reactor.core.publisher.MonoDelaySubscription.accept(MonoDelaySubscription.java:34)
    at azure_cosmos_spark.reactor.core.publisher.FluxDelaySubscription$DelaySubscriptionOtherSubscriber.onNext(FluxDelaySubscription.java:131)
    at azure_cosmos_spark.reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:271)
    at azure_cosmos_spark.reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:286)
    at azure_cosmos_spark.reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
    at azure_cosmos_spark.reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
2023-11-02 23:28:17,221 WARN GoneAndRetryWithRetryPolicy [transport-response-bounded-elastic-272]: Operation will NOT be retried. Write operations which failed due to transient transport errors can not be retried safely when sending the request to the service because they aren't idempotent. Current attempt 1, Exception: 
{"ClassName":"GoneException","userAgent":"azsdk-java-cosmos/4.41.0 Linux/4.15.0-1170-azure JRE/1.8.0_382","statusCode":410,"resourceAddress":"rntbd://*********","error":"{\"Errors\":[\"The requested operation exceeded maximum alloted time. Learn more: https://aka.ms/cosmosdb-tsg-service-request-timeout\"]}","innerErrorMessage":"[\"The requested operation exceeded maximum alloted time. Learn more: https://aka.ms/cosmosdb-tsg-service-request-timeout\"]","causeInfo":"[class: class azure_cosmos_spark.com.azure.cosmos.implementation.RequestTimeoutException, message: {\"innerErrorMessage\":\"[\\\"The requested operation exceeded maximum alloted time. Learn more: https://aka.ms/cosmosdb-tsg-service-request-timeout\\\"]\"}]","responseHeaders":"{x-ms-request-duration-ms=0.0, x-ms-schemaversion=1.16, x-ms-transport-request-id=34761, x-ms-serviceversion= version=2.14.0.0, x-ms-activity-id=7b95d330-79d7-11ee-8b25-e110b72bade5}"}
    at azure_cosmos_spark.com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestManager.messageReceived(RntbdRequestManager.java:942)
    at azure_cosmos_spark.com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestManager.channelRead(RntbdRequestManager.java:195)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at azure_cosmos_spark.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
    at azure_cosmos_spark.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at azure_cosmos_spark.io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
    at azure_cosmos_spark.io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at azure_cosmos_spark.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at azure_cosmos_spark.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373)
    at azure_cosmos_spark.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1236)
    at azure_cosmos_spark.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285)
    at azure_cosmos_spark.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
    at azure_cosmos_spark.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
    at azure_cosmos_spark.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at azure_cosmos_spark.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
    at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at azure_cosmos_spark.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at azure_cosmos_spark.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at azure_cosmos_spark.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
    at azure_cosmos_spark.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
    at azure_cosmos_spark.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
    at azure_cosmos_spark.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
    at azure_cosmos_spark.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at azure_cosmos_spark.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at azure_cosmos_spark.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:750)
Caused by: {"ClassName":"RequestTimeoutException","userAgent":"azsdk-java-cosmos/4.41.0 Linux/4.15.0-1170-azure JRE/1.8.0_382","statusCode":408,"resourceAddress":null,"error":"{\"Errors\":[\"The requested operation exceeded maximum alloted time. Learn more: https://aka.ms/cosmosdb-tsg-service-request-timeout\"]}","innerErrorMessage":"[\"The requested operation exceeded maximum alloted time. Learn more: https://aka.ms/cosmosdb-tsg-service-request-timeout\"]","causeInfo":null,"responseHeaders":"{x-ms-request-duration-ms=0.0, x-ms-schemaversion=1.16, x-ms-transport-request-id=34761, x-ms-serviceversion= version=2.14.0.0, x-ms-activity-id=7b95d330-79d7-11ee-8b25-e110b72bade5}"}
    at azure_cosmos_spark.com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestManager.messageReceived(RntbdRequestManager.java:941)
    ... 40 more

Setup (please complete the following information):

joshfree commented 10 months ago

@kushagraThapar could you please help route this?

kushagraThapar commented 10 months ago

@xinlian12 please take a look at this, thanks!