Open Limess opened 1 year ago
After upgrading to 0.14.1 (EMR 7.1.0) this is still occuring. This didn't happen until enabling metadata at which point the timeline server becomes a bottleneck and source of failures.
This only occurs during our "backfill" job where we rewrite most of the table, our incremental loads with small clusters don't exhibit this issue.
Any recommendations to mitigate this? The cluster is pretty large using this config:
hoodie.embed.timeline.server.threads
help - it seems this is fixed to 200 threads regardless of anything else, despite the documentation suggesting it's variable.hoodie.embed.timeline.server.async
help?From what I can tell based on OS level metrics, the driver node (r6g.4xlarge, driver + 3 executors) is barely doing anything CPU wise? It seems to be using < 10% CPU
[
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "false"
}
},
{
"Classification": "spark-defaults",
"Properties": {
"spark.default.parallelism": "3352",
"spark.driver.cores": "4",
"spark.driver.extraJavaOptions": "-XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35",
"spark.driver.memory": "25g",
"spark.driver.memoryOverhead": "3g",
"spark.dynamicAllocation.enabled": "false",
"spark.executor.cores": "4",
"spark.executor.extraJavaOptions": "-XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35",
"spark.executor.instances": "419",
"spark.executor.maxNumFailures": "100",
"spark.executor.memory": "25g",
"spark.executor.memoryOverhead": "3g",
"spark.executor.processTreeMetrics.enabled": "true",
"spark.executorEnv.PEX_INHERIT_PATH": "fallback",
"spark.hadoop.fs.s3.connection.maximum": "1000",
"spark.hadoop.fs.s3a.connection.maximum": "1000",
"spark.kryoserializer.buffer.max": "256m",
"spark.metrics.namespace": "spark",
"spark.rdd.compress": "true",
"spark.scheduler.mode": "FAIR",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.shuffle.service.enabled": "true",
"spark.sql.adaptive.coalescePartitions.enabled": "true",
"spark.sql.shuffle.partitions": "3352",
"spark.task.maxFailures": "10",
"spark.ui.prometheus.enabled": "true",
"spark.yarn.appMasterEnv.PEX_INHERIT_PATH": "fallback",
"spark.yarn.maxAppAttempts": "1"
}
},
{
"Classification": "spark-log4j2",
"Properties": {
"logger.hudi.level": "INFO",
"logger.hudi.name": "org.apache.hudi"
}
},
{
"Classification": "spark-metrics",
"Properties": {
"*.sink.prometheusServlet.class": "org.apache.spark.metrics.sink.PrometheusServlet",
"*.sink.prometheusServlet.path": "/metrics/prometheus",
"applications.sink.prometheusServlet.path": "/metrics/applications/prometheus",
"driver.source.jvm.class": "org.apache.spark.metrics.source.JvmSource",
"executor.source.jvm.class": "org.apache.spark.metrics.source.JvmSource",
"master.sink.prometheusServlet.path": "/metrics/master/prometheus",
"master.source.jvm.class": "org.apache.spark.metrics.source.JvmSource",
"worker.source.jvm.class": "org.apache.spark.metrics.source.JvmSource"
}
},
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator "
}
},
{
"Classification": "yarn-site",
"Properties": {
"yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage": "99.0",
"yarn.nodemanager.pmem-check-enabled": "false",
"yarn.nodemanager.vmem-check-enabled": "false"
}
},
{
"Classification": "emrfs-site",
"Properties": {
"fs.s3.maxConnections": "1000"
}
},
{
"Classification": "hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
},
{
"Classification": "hdfs-site",
"Properties": {
"dfs.replication": "2"
}
},
{
"Classification": "presto-connector-hive",
"Properties": {
"hive.metastore.glue.datacatalog.enabled": "true",
"hive.parquet.use-column-names": "true"
}
},
{
"Classification": "spark-hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
},
{
"Classification": "spark-env",
"Configurations": [
{
"Classification": "export",
"Properties": {
"PYSPARK_PYTHON": "./data_platform_spark_jobs.pex"
}
}
],
"Properties": {}
},
{
"Classification": "hadoop-env",
"Configurations": [
{
"Classification": "export",
"Properties": {
"HADOOP_DATANODE_OPTS": "-javaagent:/etc/prometheus/jmx_prometheus_javaagent.jar=7001:/etc/hadoop/conf/hdfs_jmx_config_datanode.yaml -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=50103",
"HADOOP_NAMENODE_OPTS": "-javaagent:/etc/prometheus/jmx_prometheus_javaagent.jar=7001:/etc/hadoop/conf/hdfs_jmx_config_namenode.yaml -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=50103"
}
}
],
"Properties": {}
},
{
"Classification": "yarn-env",
"Configurations": [
{
"Classification": "export",
"Properties": {
"YARN_NODEMANAGER_OPTS": "-javaagent:/etc/prometheus/jmx_prometheus_javaagent.jar=7005:/etc/hadoop/conf/yarn_jmx_config_node_manager.yaml -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=50111",
"YARN_RESOURCEMANAGER_OPTS": "-javaagent:/etc/prometheus/jmx_prometheus_javaagent.jar=7005:/etc/hadoop/conf/yarn_jmx_config_resource_manager.yaml -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=50111"
}
}
],
"Properties": {}
},
{
"Classification": "hudi-defaults",
"Properties": {
"hoodie.archive.async": "true",
"hoodie.bulkinsert.sort.mode": "GLOBAL_SORT",
"hoodie.clean.async": "true",
"hoodie.cleaner.commits.retained": "1",
"hoodie.cleaner.policy.failed.writes": "LAZY",
"hoodie.datasource.hive_sync.support_timestamp": "true",
"hoodie.datasource.meta.sync.glue.metadata_file_listing": "true",
"hoodie.enable.data.skipping": "true",
"hoodie.filesystem.view.remote.retry.enable": "true",
"hoodie.keep.max.commits": "15",
"hoodie.keep.min.commits": "10",
"hoodie.metadata.index.bloom.filter.enable": "true",
"hoodie.metadata.index.column.stats.enable": "true",
"hoodie.metrics.on": "true",
"hoodie.metrics.reporter.type": "PROMETHEUS",
"hoodie.parquet.compression.codec": "snappy",
"hoodie.parquet.max.file.size": "536870912",
"hoodie.parquet.small.file.limit": "429496729",
"hoodie.write.concurrency.early.conflict.detection.enable": "true",
"hoodie.write.concurrency.mode": "optimistic_concurrency_control",
"hoodie.write.lock.dynamodb.billing_mode": "PAY_PER_REQUEST",
"hoodie.write.lock.dynamodb.endpoint_url": "dynamodb.eu-west-1.amazonaws.com",
"hoodie.write.lock.dynamodb.region": "eu-west-1",
"hoodie.write.lock.dynamodb.table": "data-platform-hudi-locks",
"hoodie.write.lock.provider": "org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider"
}
}
]
with some additional dataset specific config:
hoodie.datasource.write.recordkey.field=id
hoodie.datasource.write.precombine.field=version
hoodie.datasource.write.partitionpath.field=story_published_partition_date
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
hoodie.datasource.write.hive_style_partitioning=true
hoodie.avro.schema.validate=true
hoodie.datasource.write.reconcile.schema=false
hoodie.table.name=${TABLE_NAME}
hoodie.datasource.hive_sync.enable=true
hoodie.datasource.hive_sync.database=articles
hoodie.datasource.hive_sync.table=${TABLE_NAME}
hoodie.datasource.hive_sync.partition_fields=story_published_partition_date
hoodie.write.lock.dynamodb.partition_key=${TABLE_NAME}
# as the record key is random, don't try to prune by ranges
hoodie.bloom.index.prune.by.ranges=false
hoodie.index.type=RECORD_INDEX
hoodie.metadata.enable=true
hoodie.metadata.record.index.enable=true
Example of logs on 0.14.1
24/07/08 03:56:35 INFO RetryExec: I/O exception (org.apache.hudi.org.apache.http.NoHttpResponseException) caught when processing request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431: The target server failed to respond
24/07/08 03:56:35 INFO RetryExec: Retrying request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431
24/07/08 03:56:35 INFO RetryExec: I/O exception (org.apache.hudi.org.apache.http.NoHttpResponseException) caught when processing request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431: The target server failed to respond
24/07/08 03:56:35 INFO RetryExec: Retrying request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431
24/07/08 03:56:35 INFO RetryExec: I/O exception (org.apache.hudi.org.apache.http.NoHttpResponseException) caught when processing request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431: The target server failed to respond
24/07/08 03:56:35 INFO RetryExec: Retrying request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431
24/07/08 03:56:55 INFO RetryExec: I/O exception (org.apache.hudi.org.apache.http.NoHttpResponseException) caught when processing request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431: The target server failed to respond
24/07/08 03:56:55 INFO RetryExec: Retrying request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431
24/07/08 03:57:34 INFO RetryExec: I/O exception (org.apache.hudi.org.apache.http.NoHttpResponseException) caught when processing request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431: The target server failed to respond
24/07/08 03:57:34 INFO RetryExec: I/O exception (org.apache.hudi.org.apache.http.NoHttpResponseException) caught when processing request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431: The target server failed to respond
24/07/08 03:57:34 INFO RetryExec: I/O exception (org.apache.hudi.org.apache.http.NoHttpResponseException) caught when processing request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431: The target server failed to respond
24/07/08 03:57:34 INFO RetryExec: Retrying request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431
24/07/08 03:57:34 INFO RetryExec: Retrying request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431
24/07/08 03:57:34 INFO RetryExec: Retrying request to {}->http://ip-10-0-100-87.eu-west-1.compute.internal:39431
24/07/08 04:01:55 WARN RetryHelper: Catch Exception for Sending request, will retry after 219 ms.
java.net.SocketTimeoutException: Read timed out
at sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:288) ~[?:?]
at sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:314) ~[?:?]
at sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355) ~[?:?]
at sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808) ~[?:?]
at java.net.Socket$SocketInputStream.read(Socket.java:966) ~[?:?]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.client.fluent.Request.internalExecute(Request.java:173) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.client.fluent.Request.execute(Request.java:177) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.get(RemoteHoodieTableFileSystemView.java:629) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.lambda$executeRequest$a89da1c0$1(RemoteHoodieTableFileSystemView.java:207) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:84) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:207) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestBaseFile(RemoteHoodieTableFileSystemView.java:618) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:100) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestBaseFile(PriorityBasedFileSystemView.java:157) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:362) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:907) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:907) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:378) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.$anonfun$getOrElseUpdate$1(BlockManager.scala:1372) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1618) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1528) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1592) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:143) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) [spark-common-utils_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) [spark-common-utils_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95) [spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632) [spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:840) [?:?]
24/07/08 04:02:34 WARN RetryHelper: Catch Exception for Sending request, will retry after 254 ms.
java.net.SocketTimeoutException: Read timed out
at sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:288) ~[?:?]
at sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:314) ~[?:?]
at sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355) ~[?:?]
at sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808) ~[?:?]
at java.net.Socket$SocketInputStream.read(Socket.java:966) ~[?:?]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.client.fluent.Request.internalExecute(Request.java:173) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.client.fluent.Request.execute(Request.java:177) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.get(RemoteHoodieTableFileSystemView.java:629) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.lambda$executeRequest$a89da1c0$1(RemoteHoodieTableFileSystemView.java:207) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:84) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:207) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestBaseFile(RemoteHoodieTableFileSystemView.java:618) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:100) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestBaseFile(PriorityBasedFileSystemView.java:157) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:362) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:907) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:907) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:378) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.$anonfun$getOrElseUpdate$1(BlockManager.scala:1372) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1618) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1528) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1592) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:143) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) [spark-common-utils_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) [spark-common-utils_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95) [spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632) [spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:840) [?:?]
24/07/08 04:02:34 ERROR RetryHelper: Still failed to Sending request after retried 3 times.
java.net.SocketTimeoutException: Read timed out
at sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:288) ~[?:?]
at sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:314) ~[?:?]
at sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355) ~[?:?]
at sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808) ~[?:?]
at java.net.Socket$SocketInputStream.read(Socket.java:966) ~[?:?]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.client.fluent.Request.internalExecute(Request.java:173) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.client.fluent.Request.execute(Request.java:177) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.get(RemoteHoodieTableFileSystemView.java:629) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.lambda$executeRequest$a89da1c0$1(RemoteHoodieTableFileSystemView.java:207) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:84) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:207) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestBaseFile(RemoteHoodieTableFileSystemView.java:618) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:100) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestBaseFile(PriorityBasedFileSystemView.java:157) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:362) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:907) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:907) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:378) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.$anonfun$getOrElseUpdate$1(BlockManager.scala:1372) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1618) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1528) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1592) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:143) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) [spark-common-utils_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) [spark-common-utils_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95) [spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632) [spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:840) [?:?]
24/07/08 04:02:34 ERROR PriorityBasedFileSystemView: Got error running preferred function. Trying secondary
org.apache.hudi.exception.HoodieRemoteException: Read timed out
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestBaseFile(RemoteHoodieTableFileSystemView.java:622) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:100) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestBaseFile(PriorityBasedFileSystemView.java:157) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:362) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:907) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:907) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:378) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.$anonfun$getOrElseUpdate$1(BlockManager.scala:1372) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1618) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1528) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1592) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:143) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) [spark-common-utils_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) [spark-common-utils_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95) [spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632) [spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: java.net.SocketTimeoutException: Read timed out
at sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:288) ~[?:?]
at sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:314) ~[?:?]
at sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355) ~[?:?]
at sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808) ~[?:?]
at java.net.Socket$SocketInputStream.read(Socket.java:966) ~[?:?]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.client.fluent.Request.internalExecute(Request.java:173) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.client.fluent.Request.execute(Request.java:177) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.get(RemoteHoodieTableFileSystemView.java:629) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.lambda$executeRequest$a89da1c0$1(RemoteHoodieTableFileSystemView.java:207) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:84) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:207) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestBaseFile(RemoteHoodieTableFileSystemView.java:618) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
... 37 more
Tried halving the cluster size and still seeing the same errors and retries at a high volume (potentially a relatively lower volume but still causing failures)
It seems that the retries work in some cases for earlier stages (or at least fall back to direct access after failing), but aren't applied later when writing marker files, leading to job failures:
24/07/08 12:52:28 ERROR Executor: Exception in task 66.0 in stage 23.0 (TID 63746)
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :66
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:907) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:907) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:378) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.$anonfun$getOrElseUpdate$1(BlockManager.scala:1372) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1618) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1528) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1592) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:143) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) ~[spark-common-utils_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) ~[spark-common-utils_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95) ~[spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632) [spark-core_2.12-3.5.0-amzn-1.jar:3.5.0-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file story_published_partition_date=2023-09-20/58c52f71-df6b-4f33-9013-bb2f344ef19a-0_66-23-63746_20240708120437027.parquet.marker.MERGE
ip-10-0-107-246.eu-west-1.compute.internal:42607 failed to respond
at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.executeCreateMarkerRequest(TimelineServerBasedWriteMarkers.java:187) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.createWithEarlyConflictDetection(TimelineServerBasedWriteMarkers.java:160) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.marker.WriteMarkers.create(WriteMarkers.java:93) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.io.HoodieWriteHandle.createMarkerFile(HoodieWriteHandle.java:144) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.io.HoodieMergeHandle.init(HoodieMergeHandle.java:198) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:134) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:125) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.io.HoodieMergeHandleFactory.create(HoodieMergeHandleFactory.java:68) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:400) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:368) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
... 33 more
Caused by: org.apache.hudi.org.apache.http.NoHttpResponseException: ip-10-0-107-246.eu-west-1.compute.internal:42607 failed to respond
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.client.fluent.Request.internalExecute(Request.java:173) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.org.apache.http.client.fluent.Request.execute(Request.java:177) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.executeRequestToTimelineServer(TimelineServerBasedWriteMarkers.java:233) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.executeCreateMarkerRequest(TimelineServerBasedWriteMarkers.java:184) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.createWithEarlyConflictDetection(TimelineServerBasedWriteMarkers.java:160) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.marker.WriteMarkers.create(WriteMarkers.java:93) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.io.HoodieWriteHandle.createMarkerFile(HoodieWriteHandle.java:144) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.io.HoodieMergeHandle.init(HoodieMergeHandle.java:198) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:134) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:125) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.io.HoodieMergeHandleFactory.create(HoodieMergeHandleFactory.java:68) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:400) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:368) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ~[hudi-spark3.5-bundle_2.12-0.14.1-amzn-0.jar:0.14.1-amzn-0]
... 33 more
I tried increasing hoodie.embed.timeline.server.threads
to 500 and setting hoodie.embed.timeline.server.async
to true
but had the same issue.
It looks like your cluster network is a bottleneck, the default timeline server is a Http based web-server with a local fs view server as a fallback, did you try to disable the remote server totally and just use the local server instead?
The network doesn't seem to be saturated (this is r6g.4xlarge instances).
It looks like your cluster network is a bottleneck, the default timeline server is a Http based web-server with a local fs view server as a fallback, did you try to disable the remove server totally and just use the local server instead?
How would I go about doing that? Is that "hoodie.embed.timeline.server": "false"
?
Setting "hoodie.embed.timeline.server": "false"
seems to have fixed the performance issues/failures.
I'd still like some recommendations on the downsides, and if there are strong downsides, what we can do to resolve the original issue.
cc @yihua for visibility.
Describe the problem you faced
see this slack thread, I was told to raise an issue. I don't have a lot of time to debug this as the upgrade isn't essential right now
After upgrading Hudi from 0.12.1 to 0.13.1 via an EMR upgrade I’m seeing a lot of these when using the spark writer:
I’ve enabled retries, but it seems to be slowing down various write tasks a lot as they retry/fallover to secondary methods. Why would this be happening? Between these, and seemingly slower bloom filter lookups, jobs are taking 2x longer or more.
I'm unsure if these correspond to these warnings on the driver logs:
I’m also seeing similar errors on writes:
I had to rollback the upgrade as it was causing writes to fail (in addition to the successes taking 2x time)
To Reproduce
Unknown
Expected behavior
The performance to not degrade after upgrading.
Environment Description
Hudi version : 0.13.1-amzn-1 (EMR 6.13.0)
Spark version : 3.4.1
Hive version : 3.1.3
Hadoop version : 3.3.3
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Additional context
Upgrading from EMR emr-6.9.0 to emr-6.13.0.
This affected both tables we ingest, write times increased 2x for each cluster when succeeding, and failed for large writes.
EMR config:
Additional Hudi config:
The hours number seems to have become nonsense above (this is from persistent spark logs on EMR)