Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB
MIT License
201 stars 120 forks source link

Read Data after enabling TTL on container level #457

Open kiranvsk1 opened 3 years ago

kiranvsk1 commented 3 years ago

Hi Team,

I am trying to read data from a Cosmos(SQL API) container using the custom query option, resulting in errors.

Setup of the container - Enable TTL with a default value of 1 week(72460*60)

What works -

  1. Can write data to the container using azure-cosmos DB-spark connector
  2. Able to read stats from the container on the portal (count(1) etc..,)

What does not work-

  1. Reads from the container using spark connector end up resulting in a 500 error.

But, if I remove the TTL setting on the container I am able read data using azure-cosmos DB-spark connector

Is this expected behavior with TTL turned on?

sajins2005 commented 3 years ago

I am facing same issue. Look like cosmos OLTP connector not working with TTL . I have faced it when TTL on (No default) and on with some seconds set in edit filed .

FabianMeiswinkel commented 3 years ago

Hi, can you tell us which version of the Spark Connector you are using? Also it would be great to see the error details (error message with callstack) of the failure.

Thanks, Fabian

sajins2005 commented 3 years ago

@FabianMeiswinkel , I am using com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.2.0. Faced the same issue in com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.1.0 as well. Azure spark runtime is 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)

Stack trace org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.139.64.4, executor 0): {"ClassName":"InternalServerErrorException","userAgent":"azsdk-java-cosmos/4.17.0-beta.1 Linux/5.4.0-1051-azure JRE/1.8.0_282","statusCode":500,"resourceAddress":"rntbd://cdb-ms-prod-eastus1-fd32.documents.azure.com:14054/apps/274509a2-d536-4a09-b0a3-f4fd526feb25/services/57940846-7939-4603-bc4e-e0297e4bd3b6/partitions/6c48721d-54ff-4bcb-8827-26bfca38bfe5/replicas/132707874448886293s/","error":"{\"Errors\":[\"An unknown error occurred while processing this request. If the issue persists, please contact Azure Support: http://aka.ms/azure-support\"]}","innerErrorMessage":"[\"An unknown error occurred while processing this request. If the issue persists, please contact Azure Support: http://aka.ms/azure-support\"]","causeInfo":null,"responseHeaders":"{x-ms-last-state-change-utc=Thu, 15 Jul 2021 01:51:45.529 GMT, x-ms-request-duration-ms=1.523, x-ms-session-token=0:-1#2302125, lsn=2302125, x-ms-request-charge=1.00, x-ms-schemaversion=1.12, x-ms-transport-request-id=4, x-ms-number-of-read-regions=0, x-ms-activity-id=dc18ad1d-eaef-11eb-ae6b-a915eade79fd, x-ms-xp-role=1, x-ms-global-Committed-lsn=2302124, x-ms-cosmos-llsn=2302125, x-ms-serviceversion= version=2.14.0.0}","cosmosDiagnostics":{"userAgent":"azsdk-java-cosmos/4.17.0-beta.1 Linux/5.4.0-1051-azure JRE/1.8.0_282","requestLatencyInMs":7,"requestStartTimeUTC":"2021-07-22T13:22:27.562Z","requestEndTimeUTC":"2021-07-22T13:22:27.569Z","responseStatisticsList":[{"storeResult":{"storePhysicalAddress":"rntbd://cdb-ms-prod-eastus1-fd32.documents.azure.com:14054/apps/274509a2-d536-4a09-b0a3-f4fd526feb25/services/57940846-7939-4603-bc4e-e0297e4bd3b6/partitions/6c48721d-54ff-4bcb-8827-26bfca38bfe5/replicas/132707874448886293s/","lsn":2302125,"globalCommittedLsn":2302124,"partitionKeyRangeId":"0","isValid":true,"statusCode":500,"subStatusCode":0,"isGone":false,"isNotFound":false,"isInvalidPartition":false,"isThroughputControlRequestRateTooLarge":false,"requestCharge":1.0,"itemLSN":-1,"sessionToken":"-1#2302125","backendLatencyInMs":1.523,"exception":"[\"An unknown error occurred while processing this request. If the issue persists, please contact Azure Support: http://aka.ms/azure-support\"]","transportRequestTimeline":[{"eventName":"created","startTimeUTC":"2021-07-22T13:22:27.563Z","durationInMicroSec":0},{"eventName":"queued","startTimeUTC":"2021-07-22T13:22:27.563Z","durationInMicroSec":0},{"eventName":"channelAcquisitionStarted","startTimeUTC":"2021-07-22T13:22:27.563Z","durationInMicroSec":1000},{"eventName":"pipelined","startTimeUTC":"2021-07-22T13:22:27.564Z","durationInMicroSec":1000},{"eventName":"transitTime","startTimeUTC":"2021-07-22T13:22:27.565Z","durationInMicroSec":4000},{"eventName":"received","startTimeUTC":"2021-07-22T13:22:27.569Z","durationInMicroSec":0},{"eventName":"completed","startTimeUTC":"2021-07-22T13:22:27.569Z","durationInMicroSec":0}],"rntbdRequestLengthInBytes":498,"rntbdResponseLengthInBytes":326,"requestPayloadLengthInBytes":55,"responsePayloadLengthInBytes":null,"channelTaskQueueSize":1,"pendingRequestsCount":1,"serviceEndpointStatistics":{"availableChannels":1,"acquiredChannels":0,"executorTaskQueueSize":0,"inflightRequests":1,"lastSuccessfulRequestTime":"2021-07-22T13:22:26.781Z","lastRequestTime":"2021-07-22T13:22:27.411Z","createdTime":"2021-07-22T13:22:26.765Z","isClosed":false}},"requestResponseTimeUTC":"2021-07-22T13:22:27.569Z","requestResourceType":"Document","requestOperationType":"Query"}],"supplementalResponseStatisticsList":[],"addressResolutionStatistics":{},"regionsContacted":["[REDACTED]"],"retryContext":{"statusAndSubStatusCodes":null,"retryCount":0,"retryLatency":0},"metadataDiagnosticsContext":{"metadataDiagnosticList":null},"serializationDiagnosticsContext":{"serializationDiagnosticsList":null},"gatewayStatistics":null,"systemInformation":{"usedMemory":"202493 KB","availableMemory":"2670339 KB","systemCpuLoad":"empty","availableProcessors":4},"clientCfgs":{"id":0,"connectionMode":"DIRECT","numberOfClients":1,"connCfg":{"rntbd":"(cto:PT5S, rto:PT5S, icto:PT0S, ieto:PT1H, mcpe:130, mrpc:30, cer:false)","gw":"(cps:1000, rto:PT5S, icto:null, p:false)","other":"(ed: true, cs: false)"},"consistencyCfg":"(consistency: Eventual, mm: true, prgns: [])"}}} at azure_cosmos_spark.com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestManager.messageReceived(RntbdRequestManager.java:807) at azure_cosmos_spark.com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestManager.channelRead(RntbdRequestManager.java:181) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at azure_cosmos_spark.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) at azure_cosmos_spark.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at azure_cosmos_spark.io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436) at azure_cosmos_spark.io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at azure_cosmos_spark.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at azure_cosmos_spark.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1368) at azure_cosmos_spark.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1234) at azure_cosmos_spark.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1280) at azure_cosmos_spark.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) at azure_cosmos_spark.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) at azure_cosmos_spark.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at azure_cosmos_spark.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at azure_cosmos_spark.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at azure_cosmos_spark.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at azure_cosmos_spark.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at azure_cosmos_spark.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at azure_cosmos_spark.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at azure_cosmos_spark.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at azure_cosmos_spark.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at azure_cosmos_spark.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at azure_cosmos_spark.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at azure_cosmos_spark.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2519) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2466) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2460) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2460) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1152) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1152) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1152) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2721) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2668) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2656) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2339) at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:298) at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:308) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:82) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:88) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:508) at org.apache.spark.sql.execution.CollectLimitExec.executeCollectResult(limit.scala:58) at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:2994) at org.apache.spark.sql.Dataset.$anonfun$collectResult$1(Dataset.scala:2985) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3709) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:116) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:249) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:101) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:845) at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:199) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3707) at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:2984) at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:194) at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:57) at com.databricks.backend.daemon.driver.PythonDriverLocal.generateTableResult(PythonDriverLocal.scala:1157) at com.databricks.backend.daemon.driver.PythonDriverLocal.$anonfun$getResultBufferInternal$1(PythonDriverLocal.scala:1069) at com.databricks.backend.daemon.driver.PythonDriverLocal.withInterpLock(PythonDriverLocal.scala:856) at com.databricks.backend.daemon.driver.PythonDriverLocal.getResultBufferInternal(PythonDriverLocal.scala:938) at com.databricks.backend.daemon.driver.DriverLocal.getResultBuffer(DriverLocal.scala:538) at com.databricks.backend.daemon.driver.PythonDriverLocal.outputSuccess(PythonDriverLocal.scala:898) at com.databricks.backend.daemon.driver.PythonDriverLocal.$anonfun$repl$8(PythonDriverLocal.scala:383) at com.databricks.backend.daemon.driver.PythonDriverLocal.withInterpLock(PythonDriverLocal.scala:856) at com.databricks.backend.daemon.driver.PythonDriverLocal.repl(PythonDriverLocal.scala:370) at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$10(DriverLocal.scala:431) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:239) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:234) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:231) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:48) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:269) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:48) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:408) at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:653) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:645) at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:486) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:598) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:391) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219) at java.lang.Thread.run(Thread.java:748)

samuelramos commented 3 years ago

Hi, I'm facing the exactly same error here.

I am using:

Databricks Runtime Version: 8.3 (Apache Spark 3.1.1, Scala 2.12) and/or 8.4 (Apache Spark 3.1.2, Scala 2.12) Cosmos DB Spark Connector: com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.2.0

Stacktrace attached. stacktrace.txt

Thanks, SR

sajins2005 commented 3 years ago

@FabianMeiswinkel
Is there any update on this issue ?

Regards, Sajin