Open koochiswathiTR opened 1 year ago
hoodie.table.timeline.timezone=LOCAL hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator hoodie.table.precombine.field=operationTime hoodie.table.version=4 hoodie.database.name= hoodie.datasource.write.hive_style_partitioning=false hoodie.table.checksum=4079573748 hoodie.partition.metafile.use.base.format=false hoodie.archivelog.folder=archived hoodie.table.name=novusnorm hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload hoodie.populate.meta.fields=true hoodie.table.type=MERGE_ON_READ hoodie.datasource.write.partitionpath.urlencode=false hoodie.table.base.file.format=PARQUET hoodie.datasource.write.drop.partition.columns=false hoodie.table.metadata.partitions= hoodie.timeline.layout.version=1 hoodie.table.recordkey.fields=guid hoodie.table.partition.fields=collectionName
If you see hoodie.table.metadata.partitions= is empty.
@nsivabalan @ad1happy2go @soumilshah1995
@koochiswathiTR Its clearly saying Metadata table was not found at path s3://a206760-novusnorm-s3-ci-use1/novusnorm/.hoodie/metadata
.
Can you let us know how you write this table? Looks like metadata was not enabled only when this table was written. By any chance, is this table got written with old hudi version?
@ad1happy2go we have not enabled metadata. without enabling metadata cant we go for offline compaction?
@koochiswathiTR We can do, but when running compaction, it is somehow checking metadata.
Can you try explicitly disable metadata while running compaction.
spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1 --verbose --driver-memory 1g --executor-memory 1g --class org.apache.hudi.utilities.HoodieCompactor /usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar --table-name novusnorm --base-path s3://a206760-novusnorm-s3-ci-use1/novusnorm --mode scheduleandexecute --spark-memory 1g --hoodie-conf hoodie.metadata.enable=false --strategy "org.apache.hudi.table.action.compact.strategy.CompactionTriggerStrategy"
When I pass --strategy "org.apache.hudi.table.action.compact.strategy.CompactionTriggerStrategy"
it say ClassNOtfound
When I try to remove --strategy it says
I want to trigger compaction based on number of commits Pls help @ad1happy2go @ad1happy2go @soumilshah1995 @nsivabalan
org.apache.hudi.utilities.HoodieCompactor uses only org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy? I want to run compaction based on number of commits Pls help @ad1happy2go @ad1happy2go @soumilshah1995 @nsivabalan
Thanks we shall take a look at that shortly
@koochiswathiTR Can you share us the timeline please.
@ad1happy2go @soumilshah1995
compaction triggered with the below command
spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1 --verbose --driver-memory 2g --executor-memory 2g --class org.apache.hudi.utilities.HoodieCompactor /usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar --table-name novusdoc --base-path s3://a206760-novusdoc-s3-dev-use1/novusdoc --mode scheduleandexecute --spark-memory 2g --hoodie-conf hoodie.metadata.enable=false --hoodie-conf hoodie.compact.inline.trigger.strategy=NUM_COMMITS --hoodie-conf hoodie.compact.inline.max.delta.commits=100
But next time when I tried to run compaction, I dont see its working. As per my understanding, the compaction should take the earliest instant time found in timeline looks that that is not happening. Please help My Hudi timeline is
@soumilshah1995 @ad1happy2go
[hadoop@ip-100-66-69-75 a206760-PowerUser2]$ spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1 --verbose --driver-memory 4g --executor-memory 16g --num-executors 8 --driver-cores 10 --executor-cores 10 --class org.apache.hudi.utilities.HoodieCompactor /usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar --table-name novusdoc --base-path s3://a206760-novusdoc-s3-dev-use1/novusdoc --mode scheduleandexecute --spark-memory 2g --hoodie-conf hoodie.metadata.enable=false --hoodie-conf hoodie.compact.inline.trigger.strategy=NUM_COMMITS --hoodie-conf hoodie.compact.inline.max.delta.commits=5 2023-06-19T10:26:47.109+0000: [GC pause (G1 Evacuation Pause) (young), 0.0037454 secs] [Parallel Time: 1.6 ms, GC Workers: 8] [GC Worker Start (ms): Min: 418.9, Avg: 419.0, Max: 419.0, Diff: 0.1] [Ext Root Scanning (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 1.8] [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 0.6] [Object Copy (ms): Min: 0.9, Avg: 1.0, Max: 1.1, Diff: 0.3, Sum: 8.1] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2] [Termination Attempts: Min: 1, Avg: 6.9, Max: 12, Diff: 11, Sum: 55] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 1.3, Avg: 1.4, Max: 1.4, Diff: 0.1, Sum: 10.9] [GC Worker End (ms): Min: 420.3, Avg: 420.3, Max: 420.3, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.1 ms] [Other: 2.0 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.7 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.0 ms] [Eden: 24576.0K(24576.0K)->0.0B(34816.0K) Survivors: 0.0B->3072.0K Heap: 24576.0K(496.0M)->4071.5K(496.0M)] [Times: user=0.01 sys=0.00, real=0.00 secs] 2023-06-19T10:26:47.455+0000: [GC pause (G1 Evacuation Pause) (young), 0.0053984 secs] [Parallel Time: 2.8 ms, GC Workers: 8] [GC Worker Start (ms): Min: 764.9, Avg: 765.1, Max: 766.4, Diff: 1.5] [Ext Root Scanning (ms): Min: 0.0, Avg: 0.3, Max: 0.9, Diff: 0.9, Sum: 2.4] [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1] [Processed Buffers: Min: 0, Avg: 0.1, Max: 1, Diff: 1, Sum: 1] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.6, Diff: 0.6, Sum: 0.7] [Object Copy (ms): Min: 0.9, Avg: 1.9, Max: 2.4, Diff: 1.5, Sum: 15.2] [Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 1.5] [Termination Attempts: Min: 1, Avg: 15.1, Max: 28, Diff: 27, Sum: 121] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 1.2, Avg: 2.5, Max: 2.7, Diff: 1.5, Sum: 19.9] [GC Worker End (ms): Min: 767.6, Avg: 767.6, Max: 767.6, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.2 ms] [Other: 2.4 ms] [Choose CSet: 0.0 ms] [Ref Proc: 2.0 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.0 ms] [Eden: 34816.0K(34816.0K)->0.0B(292.0M) Survivors: 3072.0K->5120.0K Heap: 39486.1K(496.0M)->7351.0K(496.0M)] [Times: user=0.02 sys=0.01, real=0.01 secs] Using properties file: /usr/lib/spark/conf/spark-defaults.conf Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer Adding default property: spark.yarn.appMasterEnv.bigdataEnv=bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801 Adding default property: spark.sql.warehouse.dir=hdfs:///user/spark/warehouse Adding default property: spark.yarn.dist.files=/etc/hudi/conf/hudi-defaults.conf Adding default property: spark.sql.parquet.fs.optimized.committer.optimization-enabled=true Adding default property: spark.executorEnv.regionShortName=use1 Adding default property: spark.executor.extraJavaOptions=-Dcom.amazonaws.sdk.disableCbor=true -Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MetaspaceSize=300M Adding default property: spark.history.fs.logDirectory=hdfs:///var/log/spark/apps Adding default property: spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem=2 Adding default property: spark.hadoop.mapreduce.output.fs.optimized.committer.enabled=true Adding default property: spark.yarn.appMasterEnv.assetId=a206760 Adding default property: spark.sql.autoBroadcastJoinThreshold=104857600 Adding default property: spark.eventLog.enabled=true Adding default property: spark.shuffle.service.enabled=false Adding default property: spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native Adding default property: spark.emr.default.executor.memory=18971M Adding default property: spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 Adding default property: spark.kryoserializer.buffer.max=1024m Adding default property: spark.yarn.historyServer.address=ip-100-66-69-75.3175.aws-int.thomsonreuters.com:18080 Adding default property: spark.stage.attempt.ignoreOnDecommissionFetchFailure=true Adding default property: spark.yarn.appMasterEnv.regionFullName=us-east-1 Adding default property: spark.yarn.appMasterEnv.regionShortName=use1 Adding default property: spark.storage.decommission.shuffleBlocks.enabled=true Adding default property: spark.executorEnv.regionFullName=us-east-1 Adding default property: spark.rpc.askTimeout=480 Adding default property: spark.sql.streaming.metricsEnabled=true Adding default property: spark.locality.wait=6s Adding default property: spark.driver.memory=2048M Adding default property: spark.decommission.enabled=true Adding default property: spark.files.fetchFailure.unRegisterOutputOnHost=true Adding default property: spark.executorEnv.assetId=a206760 Adding default property: spark.executor.defaultJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' -Dfile.encoding=UTF-8 Adding default property: spark.resourceManager.cleanupExpiredHost=true Adding default property: spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS=$(hostname -f) Adding default property: spark.sql.emr.internal.extensions=com.amazonaws.emr.spark.EmrSparkSessionExtensions Adding default property: spark.emr.default.executor.cores=4 Adding default property: spark.driver.extraJavaOptions=-Dcom.amazonaws.sdk.disableCbor=true -Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MetaspaceSize=300M Adding default property: spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds=2000 Adding default property: spark.deploy.mode=cluster Adding default property: spark.master=yarn Adding default property: spark.sql.parquet.output.committer.class=com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter Adding default property: spark.rpc.message.maxSize=416 Adding default property: spark.driver.defaultJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Dfile.encoding=UTF-8 Adding default property: spark.executorEnv.correlationId=offline_compaction_schedule Adding default property: spark.blacklist.decommissioning.timeout=1h Adding default property: spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native Adding default property: fs.s3.maxRetries=1000000 Adding default property: spark.sql.hive.metastore.sharedPrefixes=com.amazonaws.services.dynamodbv2 Adding default property: spark.executor.memory=18971M Adding default property: spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar Adding default property: spark.eventLog.dir=hdfs:///var/log/spark/apps Adding default property: spark.executorEnv.bigdataEnv=bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801 Adding default property: spark.dynamicAllocation.enabled=false Adding default property: spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar Adding default property: spark.executor.cores=4 Adding default property: spark.history.ui.port=18080 Adding default property: spark.blacklist.decommissioning.enabled=true Adding default property: spark.yarn.appMasterEnv.correlationId=offline_compaction_schedule Adding default property: spark.decommissioning.timeout.threshold=20 Adding default property: spark.yarn.heterogeneousExecutors.enabled=false Adding default property: spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem=true Adding default property: spark.hadoop.yarn.timeline-service.enabled=false Adding default property: spark.yarn.executor.memoryOverheadFactor=0.1875 Warning: Ignoring non-Spark config property: fs.s3.maxRetries Parsed arguments: master yarn deployMode null executorMemory 16g executorCores 10 totalExecutorCores null propertiesFile /usr/lib/spark/conf/spark-defaults.conf driverMemory 4g driverCores 10 driverExtraClassPath /usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar driverExtraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native driverExtraJavaOptions -Dcom.amazonaws.sdk.disableCbor=true -Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MetaspaceSize=300M supervise false queue null numExecutors 8 files null pyFiles null archives null mainClass org.apache.hudi.utilities.HoodieCompactor primaryResource file:/usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar name org.apache.hudi.utilities.HoodieCompactor childArgs [--table-name novusdoc --base-path s3://a206760-novusdoc-s3-dev-use1/novusdoc --mode scheduleandexecute --spark-memory 2g --hoodie-conf hoodie.metadata.enable=false --hoodie-conf hoodie.compact.inline.trigger.strategy=NUM_COMMITS --hoodie-conf hoodie.compact.inline.max.delta.commits=5] jars null packages org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1 packagesExclusions null repositories null verbose true
Spark properties used, including those specified through --conf and those from the properties file /usr/lib/spark/conf/spark-defaults.conf: (spark.sql.emr.internal.extensions,com.amazonaws.emr.spark.EmrSparkSessionExtensions) (spark.executor.defaultJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' -Dfile.encoding=UTF-8) (spark.blacklist.decommissioning.timeout,1h) (spark.yarn.appMasterEnv.correlationId,offline_compaction_schedule) (spark.yarn.executor.memoryOverheadFactor,0.1875) (spark.executorEnv.correlationId,offline_compaction_schedule) (spark.executorEnv.regionShortName,use1) (spark.blacklist.decommissioning.enabled,true) (spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native) (spark.executorEnv.assetId,a206760) (spark.hadoop.yarn.timeline-service.enabled,false) (spark.driver.memory,4g) (spark.executor.memory,18971M) (spark.executorEnv.bigdataEnv,bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801) (spark.sql.parquet.fs.optimized.committer.optimization-enabled,true) (spark.sql.warehouse.dir,hdfs:///user/spark/warehouse) (spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native) (spark.yarn.historyServer.address,ip-100-66-69-75.3175.aws-int.thomsonreuters.com:18080) (spark.yarn.heterogeneousExecutors.enabled,false) (spark.rpc.message.maxSize,416) (spark.eventLog.enabled,true) (spark.storage.decommission.shuffleBlocks.enabled,true) (spark.yarn.dist.files,/etc/hudi/conf/hudi-defaults.conf) (spark.files.fetchFailure.unRegisterOutputOnHost,true) (spark.history.ui.port,18080) (spark.stage.attempt.ignoreOnDecommissionFetchFailure,true) (spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds,2000) (spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f)) (spark.rpc.askTimeout,480) (spark.sql.streaming.metricsEnabled,true) (spark.driver.defaultJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Dfile.encoding=UTF-8) (spark.serializer,org.apache.spark.serializer.KryoSerializer) (spark.executor.extraJavaOptions,-Dcom.amazonaws.sdk.disableCbor=true -Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MetaspaceSize=300M) (spark.resourceManager.cleanupExpiredHost,true) (spark.deploy.mode,cluster) (spark.history.fs.logDirectory,hdfs:///var/log/spark/apps) (spark.shuffle.service.enabled,false) (spark.yarn.appMasterEnv.regionFullName,us-east-1) (spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version,2) (spark.locality.wait,6s) (spark.emr.default.executor.cores,4) (spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem,2) (spark.driver.extraJavaOptions,-Dcom.amazonaws.sdk.disableCbor=true -Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MetaspaceSize=300M) (spark.kryoserializer.buffer.max,1024m) (spark.hadoop.mapreduce.output.fs.optimized.committer.enabled,true) (spark.yarn.appMasterEnv.regionShortName,use1) (spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar) (spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2) (spark.eventLog.dir,hdfs:///var/log/spark/apps) (spark.executorEnv.regionFullName,us-east-1) (spark.master,yarn) (spark.emr.default.executor.memory,18971M) (spark.decommission.enabled,true) (spark.dynamicAllocation.enabled,false) (spark.yarn.appMasterEnv.assetId,a206760) (spark.sql.autoBroadcastJoinThreshold,104857600) (spark.sql.parquet.output.committer.class,com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter) (spark.yarn.appMasterEnv.bigdataEnv,bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801) (spark.executor.cores,4) (spark.decommissioning.timeout.threshold,20) (spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem,true) (spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar)
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 5 | 0 | 0 | 0 || 5 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-1341569f-530d-4afe-a08e-cc9ee2167f5c confs: [default] 0 artifacts copied, 5 already retrieved (0kB/12ms) 2023-06-19T10:26:48.356+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.util.ShutdownHookManager] [ShutdownHookManager]: Adding shutdown hook Main class: org.apache.hudi.utilities.HoodieCompactor Arguments: --table-name novusdoc --base-path s3://a206760-novusdoc-s3-dev-use1/novusdoc --mode scheduleandexecute --spark-memory 2g --hoodie-conf hoodie.metadata.enable=false --hoodie-conf hoodie.compact.inline.trigger.strategy=NUM_COMMITS --hoodie-conf hoodie.compact.inline.max.delta.commits=5 Spark config: (spark.serializer,org.apache.spark.serializer.KryoSerializer) (spark.yarn.appMasterEnv.bigdataEnv,bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801) (spark.sql.warehouse.dir,hdfs:///user/spark/warehouse) (spark.yarn.dist.files,file:/etc/hudi/conf.dist/hudi-defaults.conf) (spark.sql.parquet.fs.optimized.committer.optimization-enabled,true) (spark.executorEnv.regionShortName,use1) (spark.executor.extraJavaOptions,-Dcom.amazonaws.sdk.disableCbor=true -Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MetaspaceSize=300M) (spark.history.fs.logDirectory,hdfs:///var/log/spark/apps) (spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem,2) (spark.hadoop.mapreduce.output.fs.optimized.committer.enabled,true) (spark.yarn.appMasterEnv.assetId,a206760) (spark.sql.autoBroadcastJoinThreshold,104857600) (spark.eventLog.enabled,true) (spark.shuffle.service.enabled,false) (spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native) (spark.emr.default.executor.memory,18971M) (spark.jars,file:/usr/lib/hudi/hudi-utilities-bundle.jar,file:/usr/lib/hudi/hudi-spark3-bundle_2.12-0.11.0-amzn-0.jar) (spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version,2) (spark.kryoserializer.buffer.max,1024m) (spark.yarn.historyServer.address,ip-100-66-69-75.3175.aws-int.thomsonreuters.com:18080) (spark.stage.attempt.ignoreOnDecommissionFetchFailure,true) (spark.yarn.appMasterEnv.regionFullName,us-east-1) (spark.yarn.appMasterEnv.regionShortName,use1) (spark.app.name,org.apache.hudi.utilities.HoodieCompactor) (spark.storage.decommission.shuffleBlocks.enabled,true) (spark.executorEnv.regionFullName,us-east-1) (spark.rpc.askTimeout,480) (spark.sql.streaming.metricsEnabled,true) (spark.locality.wait,6s) (spark.driver.memory,4g) (spark.executor.instances,8) (spark.decommission.enabled,true) (spark.files.fetchFailure.unRegisterOutputOnHost,true) (spark.submit.pyFiles,) (spark.executorEnv.assetId,a206760) (spark.executor.defaultJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' -Dfile.encoding=UTF-8) (spark.resourceManager.cleanupExpiredHost,true) (spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f)) (spark.sql.emr.internal.extensions,com.amazonaws.emr.spark.EmrSparkSessionExtensions) (spark.emr.default.executor.cores,4) (spark.driver.extraJavaOptions,-Dcom.amazonaws.sdk.disableCbor=true -Duser.timezone=GMT -verbose:gc -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:MetaspaceSize=300M) (spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds,2000) (spark.submit.deployMode,client) (spark.deploy.mode,cluster) (spark.master,yarn) (spark.sql.parquet.output.committer.class,com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter) (spark.rpc.message.maxSize,416) (spark.driver.defaultJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Dfile.encoding=UTF-8) (spark.executorEnv.correlationId,offline_compaction_schedule) (spark.blacklist.decommissioning.timeout,1h) (spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native) (spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2) (spark.executor.memory,16g) (spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar) (spark.eventLog.dir,hdfs:///var/log/spark/apps) (spark.executorEnv.bigdataEnv,bigdata_environment:dev,bigdata_project:tacticalnovusingest,bigdata_environment-type:DEVELOPMENT,bigdata_region:us-east-1,bigdata_servicename:tactical-novus-ingest,bigdata_version:dev4856801) (spark.dynamicAllocation.enabled,false) (spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar) (spark.executor.cores,10) (spark.history.ui.port,18080) (spark.repl.local.jars,file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar,file:///home/hadoop/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar,file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar,file:///home/hadoop/.ivy2/jars/org.apache.htrace_htrace-core-3.1.0-incubating.jar,file:///home/hadoop/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar) (spark.blacklist.decommissioning.enabled,true) (spark.yarn.appMasterEnv.correlationId,offline_compaction_schedule) (spark.decommissioning.timeout.threshold,20) (spark.yarn.heterogeneousExecutors.enabled,false) (spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem,true) (spark.yarn.dist.jars,file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar,file:///home/hadoop/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar,file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar,file:///home/hadoop/.ivy2/jars/org.apache.htrace_htrace-core-3.1.0-incubating.jar,file:///home/hadoop/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar) (spark.hadoop.yarn.timeline-service.enabled,false) (spark.yarn.executor.memoryOverheadFactor,0.1875) Classpath elements: file:/usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar file:///home/hadoop/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar file:///home/hadoop/.ivy2/jars/org.apache.htrace_htrace-core-3.1.0-incubating.jar file:///home/hadoop/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar
2023-06-19T10:26:48.653+0000 [WARN] [offline_compaction_schedule] [org.apache.spark.util.DependencyUtils] [DependencyUtils]: Local jar /usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar does not exist, skipping.
2023-06-19T10:26:48.759+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SparkContext] [SparkContext]: Running Spark version 3.2.1-amzn-0
2023-06-19T10:26:48.783+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.resource.ResourceUtils] [ResourceUtils]: ==============================================================
2023-06-19T10:26:48.783+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.resource.ResourceUtils] [ResourceUtils]: No custom resources configured for spark.driver.
2023-06-19T10:26:48.784+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.resource.ResourceUtils] [ResourceUtils]: ==============================================================
2023-06-19T10:26:48.784+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SparkContext] [SparkContext]: Submitted application: compactor-novusdoc
2023-06-19T10:26:48.810+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.resource.ResourceProfile] [ResourceProfile]: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 10, script: , vendor: , memory -> name: memory, amount: 2048, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
2023-06-19T10:26:48.824+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.resource.ResourceProfile] [ResourceProfile]: Limiting resource is cpus at 10 tasks per executor
2023-06-19T10:26:48.826+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.resource.ResourceProfileManager] [ResourceProfileManager]: Added ResourceProfile id: 0
2023-06-19T10:26:48.884+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SecurityManager] [SecurityManager]: Changing view acls to: hadoop
2023-06-19T10:26:48.884+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SecurityManager] [SecurityManager]: Changing modify acls to: hadoop
2023-06-19T10:26:48.884+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SecurityManager] [SecurityManager]: Changing view acls groups to:
2023-06-19T10:26:48.885+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SecurityManager] [SecurityManager]: Changing modify acls groups to:
2023-06-19T10:26:48.885+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SecurityManager] [SecurityManager]: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
2023-06-19T10:26:48.918+0000 [INFO] [offline_compaction_schedule] [org.apache.hadoop.conf.Configuration.deprecation] [deprecation]: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
2023-06-19T10:26:48.918+0000 [INFO] [offline_compaction_schedule] [org.apache.hadoop.conf.Configuration.deprecation] [deprecation]: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
2023-06-19T10:26:48.919+0000 [INFO] [offline_compaction_schedule] [org.apache.hadoop.conf.Configuration.deprecation] [deprecation]: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2023-06-19T10:26:49.159+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.network.server.TransportServer] [TransportServer]: Shuffle server started on port: 35007
2023-06-19T10:26:49.168+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.util.Utils] [Utils]: Successfully started service 'sparkDriver' on port 35007.
2023-06-19T10:26:49.177+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.SparkEnv] [SparkEnv]: Using serializer: class org.apache.spark.serializer.KryoSerializer
2023-06-19T10:26:49.196+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SparkEnv] [SparkEnv]: Registering MapOutputTracker
2023-06-19T10:26:49.197+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.MapOutputTrackerMasterEndpoint] [MapOutputTrackerMasterEndpoint]: init
2023-06-19T10:26:49.235+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SparkEnv] [SparkEnv]: Registering BlockManagerMaster
2023-06-19T10:26:49.300+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SparkEnv] [SparkEnv]: Registering BlockManagerMasterHeartbeat
2023-06-19T10:26:49.400+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SparkEnv] [SparkEnv]: Registering OutputCommitCoordinator
2023-06-19T10:26:49.404+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.subresultcache.SubResultCacheManager] [SubResultCacheManager]: Sub-result caches config to enable false.
2023-06-19T10:26:49.404+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.subresultcache.SubResultCacheManager] [SubResultCacheManager]: Sub-result caches are disabled.
2023-06-19T10:26:49.423+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.SecurityManager] [SecurityManager]: Created SSL options for ui: SSLOptions{enabled=false, port=None, keyStore=None, keyStorePassword=None, trustStore=None, trustStorePassword=None, protocol=None, enabledAlgorithms=Set()}
2023-06-19T10:26:49.504+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.util.log] [log]: Logging initialized @2813ms to org.sparkproject.jetty.util.log.Slf4jLog
2023-06-19T10:26:49.581+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.Server] [Server]: jetty-9.4.43.v20210629; built: 2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 1.8.0_372-b07
2023-06-19T10:26:49.606+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.Server] [Server]: Started @2915ms
2023-06-19T10:26:49.608+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.ui.JettyUtils] [JettyUtils]: Using requestHeaderSize: 8192
2023-06-19T10:26:49.645+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.AbstractConnector] [AbstractConnector]: Started ServerConnector@34dc85a{HTTP/1.1, (http/1.1)}{0.0.0.0:8090}
2023-06-19T10:26:49.646+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.util.Utils] [Utils]: Successfully started service 'SparkUI' on port 8090.
2023-06-19T10:26:49.671+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@b8a7e43{/jobs,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.674+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@719843e5{/jobs/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.675+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@58112bc4{/jobs/job,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.676+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@2f5c1332{/jobs/job/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.677+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@7cab1508{/stages,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.678+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@258ee7de{/stages/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.679+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@6d171ce0{/stages/stage,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.680+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@6e1d4137{/stages/stage/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.681+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@29a4f594{/stages/pool,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.682+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@5327a06e{/stages/pool/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.683+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@287f7811{/storage,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.684+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@2b556bb2{/storage/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.684+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@17271176{/storage/rdd,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.685+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@2e34384c{/storage/rdd/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.686+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@1f52eb6f{/environment,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.687+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@58294867{/environment/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.688+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@6fc3e1a4{/executors,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.689+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@2d5f7182{/executors/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.690+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@29ea78b1{/executors/threadDump,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.691+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@7baf6acf{/executors/threadDump/json,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.701+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@7b3315a5{/static,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.702+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@629ae7e{/,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.703+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@de88ac6{/api,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.704+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@42fcc7e6{/jobs/job/kill,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.705+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.handler.ContextHandler] [ContextHandler]: Started o.s.j.s.ServletContextHandler@5da7cee2{/stages/stage/kill,null,AVAILABLE,@Spark}
2023-06-19T10:26:49.707+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.ui.SparkUI] [SparkUI]: Bound SparkUI to 0.0.0.0, and started at http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8090
2023-06-19T10:26:49.729+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SparkContext] [SparkContext]: Added JAR file:/usr/lib/hudi/hudi-utilities-bundle.jar at spark://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:35007/jars/hudi-utilities-bundle.jar with timestamp 1687170408750
2023-06-19T10:26:49.730+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SparkContext] [SparkContext]: Added JAR file:/usr/lib/hudi/hudi-spark3-bundle_2.12-0.11.0-amzn-0.jar at spark://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:35007/jars/hudi-spark3-bundle_2.12-0.11.0-amzn-0.jar with timestamp 1687170408750
2023-06-19T10:26:49.849+0000: [GC pause (G1 Evacuation Pause) (young), 0.0244707 secs]
[Parallel Time: 11.2 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 3159.6, Avg: 3159.7, Max: 3159.7, Diff: 0.1]
[Ext Root Scanning (ms): Min: 0.7, Avg: 1.5, Max: 4.4, Diff: 3.7, Sum: 11.7]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Processed Buffers: Min: 0, Avg: 1.0, Max: 2, Diff: 2, Sum: 8]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 1.3, Diff: 1.3, Sum: 4.3]
[Object Copy (ms): Min: 6.6, Avg: 8.9, Max: 9.7, Diff: 3.1, Sum: 71.0]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
[Termination Attempts: Min: 1, Avg: 128.1, Max: 158, Diff: 157, Sum: 1025]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[GC Worker Total (ms): Min: 11.0, Avg: 11.0, Max: 11.1, Diff: 0.1, Sum: 88.3]
[GC Worker End (ms): Min: 3170.7, Avg: 3170.7, Max: 3170.7, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 13.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 12.3 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.3 ms]
[Eden: 292.0M(292.0M)->0.0B(262.0M) Survivors: 5120.0K->35840.0K Heap: 299.2M(496.0M)->37864.7K(496.0M)]
[Times: user=0.09 sys=0.01, real=0.02 secs]
2023-06-19T10:26:49.974+0000 [INFO] [offline_compaction_schedule] [org.apache.hadoop.yarn.client.RMProxy] [RMProxy]: Connecting to ResourceManager at ip-100-66-69-75.3175.aws-int.thomsonreuters.com/100.66.69.75:8032
2023-06-19T10:26:50.132+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Requesting a new application from cluster with 2 NodeManagers
2023-06-19T10:26:50.432+0000 [INFO] [offline_compaction_schedule] [org.apache.hadoop.conf.Configuration] [Configuration]: resource-types.xml not found
2023-06-19T10:26:50.432+0000 [INFO] [offline_compaction_schedule] [org.apache.hadoop.yarn.util.resource.ResourceUtils] [ResourceUtils]: Unable to find 'resource-types.xml'.
2023-06-19T10:26:50.445+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Verifying our application has not requested more than the maximum memory capability of the cluster (122880 MB per container)
2023-06-19T10:26:50.445+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Will allocate AM container, with 896 MB memory including 384 MB overhead
2023-06-19T10:26:50.445+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Setting up container launch context for our AM
2023-06-19T10:26:50.446+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Setting up the launch environment for our AM container
2023-06-19T10:26:50.452+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Preparing resources for our AM container
2023-06-19T10:26:50.478+0000 [WARN] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2023-06-19T10:26:54.119+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource file:/mnt/tmp/spark-94366315-0ad4-4f1a-8051-1c517b83f435/spark_libs4987513252404456461.zip -> hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/spark_libs4987513252404456461.zip
2023-06-19T10:26:54.546+0000: [GC pause (G1 Evacuation Pause) (young), 0.0166820 secs]
[Parallel Time: 11.6 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 7856.4, Avg: 7856.7, Max: 7857.8, Diff: 1.4]
[Ext Root Scanning (ms): Min: 0.0, Avg: 1.1, Max: 4.5, Diff: 4.5, Sum: 8.5]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Processed Buffers: Min: 0, Avg: 0.6, Max: 3, Diff: 3, Sum: 5]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.7, Max: 1.6, Diff: 1.6, Sum: 5.3]
[Object Copy (ms): Min: 7.0, Avg: 9.3, Max: 10.5, Diff: 3.5, Sum: 74.6]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.5]
[Termination Attempts: Min: 1, Avg: 154.9, Max: 198, Diff: 197, Sum: 1239]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[GC Worker Total (ms): Min: 10.1, Avg: 11.2, Max: 11.5, Diff: 1.4, Sum: 89.9]
[GC Worker End (ms): Min: 7867.9, Avg: 7867.9, Max: 7867.9, Diff: 0.1]
[Code Root Fixup: 0.2 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 4.7 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 4.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.3 ms]
[Eden: 262.0M(262.0M)->0.0B(262.0M) Survivors: 35840.0K->35840.0K Heap: 299.0M(496.0M)->37559.0K(496.0M)]
[Times: user=0.09 sys=0.01, real=0.02 secs]
2023-06-19T10:26:55.069+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource file:/home/hadoop/.ivy2/jars/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar -> hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.hudi_hudi-utilities-bundle_2.12-0.11.1.jar
2023-06-19T10:26:55.222+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource file:/home/hadoop/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar -> hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.spark_spark-avro_2.11-2.4.4.jar
2023-06-19T10:26:55.238+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource file:/home/hadoop/.ivy2/jars/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar -> hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.hudi_hudi-spark3-bundle_2.12-0.11.1.jar
2023-06-19T10:26:55.239+0000: [GC pause (G1 Evacuation Pause) (young), 0.0122827 secs]
[Parallel Time: 11.0 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 8548.8, Avg: 8548.9, Max: 8548.9, Diff: 0.1]
[Ext Root Scanning (ms): Min: 0.3, Avg: 0.8, Max: 3.8, Diff: 3.5, Sum: 6.3]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Processed Buffers: Min: 0, Avg: 0.4, Max: 1, Diff: 1, Sum: 3]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 1.2, Diff: 1.2, Sum: 4.8]
[Object Copy (ms): Min: 7.0, Avg: 9.3, Max: 10.3, Diff: 3.3, Sum: 74.2]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.4]
[Termination Attempts: Min: 1, Avg: 137.4, Max: 175, Diff: 174, Sum: 1099]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.4]
[GC Worker Total (ms): Min: 10.8, Avg: 10.9, Max: 10.9, Diff: 0.1, Sum: 86.9]
[GC Worker End (ms): Min: 8559.7, Avg: 8559.7, Max: 8559.8, Diff: 0.1]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 1.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.5 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.2 ms]
[Eden: 262.0M(262.0M)->0.0B(280.0M) Survivors: 35840.0K->17408.0K Heap: 298.7M(496.0M)->19127.0K(496.0M)]
[Times: user=0.09 sys=0.00, real=0.01 secs]
2023-06-19T10:26:55.407+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource file:/home/hadoop/.ivy2/jars/org.apache.htrace_htrace-core-3.1.0-incubating.jar -> hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/org.apache.htrace_htrace-core-3.1.0-incubating.jar
2023-06-19T10:26:55.426+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource file:/home/hadoop/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar -> hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/org.spark-project.spark_unused-1.0.0.jar
2023-06-19T10:26:55.438+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource file:/etc/hudi/conf.dist/hudi-defaults.conf -> hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/hudi-defaults.conf
2023-06-19T10:26:55.858+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Creating an archive with the config files for distribution at /mnt/tmp/spark-94366315-0ad4-4f1a-8051-1c517b83f435/spark_conf7322044392243776097.zip.
2023-06-19T10:26:55.946+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: Uploading resource file:/mnt/tmp/spark-94366315-0ad4-4f1a-8051-1c517b83f435/spark_conf7322044392243776097.zip -> hdfs://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8020/user/hadoop/.sparkStaging/application_1687146322573_0047/spark_conf.zip
2023-06-19T10:26:56.009+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: ===============================================================================
2023-06-19T10:26:56.009+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: YARN AM launch context:
2023-06-19T10:26:56.010+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: user class: N/A
2023-06-19T10:26:56.010+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: env:
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: regionShortName -> use1
2023-06-19T10:26:56.011+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.deploy.yarn.Client] [Client]: CLASSPATH -> /usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/aws-sdk-v2/bundle-2.17.282.jar
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
https://javalin.io/documentation
2023-06-19T10:27:08.819+0000 [INFO] [offline_compaction_schedule] [io.javalin.Javalin] [Javalin]: Starting Javalin ... 2023-06-19T10:27:08.957+0000 [INFO] [offline_compaction_schedule] [io.javalin.Javalin] [Javalin]: Listening on http://localhost:42997/ 2023-06-19T10:27:08.957+0000 [INFO] [offline_compaction_schedule] [io.javalin.Javalin] [Javalin]: Javalin started in 142ms \o/ 2023-06-19T10:27:08.957+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.timeline.service.TimelineService] [TimelineService]: Starting Timeline server on port :42997 2023-06-19T10:27:08.957+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.client.embedded.EmbeddedTimelineService] [EmbeddedTimelineService]: Started embedded timeline server at ip-100-66-69-75.3175.aws-int.thomsonreuters.com:42997 2023-06-19T10:27:08.970+0000 [WARN] [offline_compaction_schedule] [org.apache.hudi.utilities.HoodieCompactor] [HoodieCompactor]: No instant time is provided for scheduling compaction. 2023-06-19T10:27:08.973+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.client.BaseHoodieWriteClient] [BaseHoodieWriteClient]: Scheduling table service COMPACT 2023-06-19T10:27:08.974+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.client.BaseHoodieWriteClient] [BaseHoodieWriteClient]: Scheduling compaction at instant time :20230619102708972 2023-06-19T10:27:08.978+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.common.table.HoodieTableMetaClient] [HoodieTableMetaClient]: Loading HoodieTableMetaClient from s3://a206760-novusdoc-s3-dev-use1/novusdoc 2023-06-19T10:27:08.990+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.common.table.HoodieTableConfig] [HoodieTableConfig]: Loading table properties from s3://a206760-novusdoc-s3-dev-use1/novusdoc/.hoodie/hoodie.properties 2023-06-19T10:27:08.990+0000 [INFO] [offline_compaction_schedule] [com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem] [S3NativeFileSystem]: Opening 's3://a206760-novusdoc-s3-dev-use1/novusdoc/.hoodie/hoodie.properties' for reading 2023-06-19T10:27:09.067+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.common.table.HoodieTableMetaClient] [HoodieTableMetaClient]: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from s3://a206760-novusdoc-s3-dev-use1/novusdoc 2023-06-19T10:27:09.068+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.common.table.HoodieTableMetaClient] [HoodieTableMetaClient]: Loading Active commit timeline for s3://a206760-novusdoc-s3-dev-use1/novusdoc 2023-06-19T10:27:09.070+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.SecurityManager] [SecurityManager]: user=dr.who aclsEnabled=false viewAcls=hadoop viewAclsGroups= 2023-06-19T10:27:09.113+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.common.table.timeline.HoodieActiveTimeline] [HoodieActiveTimeline]: Loaded instants upto : Option{val=[20230619102516597deltacommitCOMPLETED]} 2023-06-19T10:27:09.121+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.common.table.view.FileSystemViewManager] [FileSystemViewManager]: Creating View Manager with storage type :REMOTE_FIRST 2023-06-19T10:27:09.121+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.common.table.view.FileSystemViewManager] [FileSystemViewManager]: Creating remote first table view 2023-06-19T10:27:09.128+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.table.action.compact.ScheduleCompactionActionExecutor] [ScheduleCompactionActionExecutor]: Checking if compaction needs to be run on s3://a206760-novusdoc-s3-dev-use1/novusdoc 2023-06-19T10:27:09.137+0000 [DEBUG] [offline_compaction_schedule] [org.apache.spark.SecurityManager] [SecurityManager]: user=dr.who aclsEnabled=false viewAcls=hadoop viewAclsGroups= 2023-06-19T10:27:09.184+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.client.BaseHoodieClient] [BaseHoodieClient]: Stopping Timeline service !! 2023-06-19T10:27:09.184+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.client.embedded.EmbeddedTimelineService] [EmbeddedTimelineService]: Closing Timeline server 2023-06-19T10:27:09.184+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.timeline.service.TimelineService] [TimelineService]: Closing Timeline Service 2023-06-19T10:27:09.184+0000 [INFO] [offline_compaction_schedule] [io.javalin.Javalin] [Javalin]: Stopping Javalin ... 2023-06-19T10:27:09.195+0000 [INFO] [offline_compaction_schedule] [io.javalin.Javalin] [Javalin]: Javalin has stopped 2023-06-19T10:27:09.195+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.timeline.service.TimelineService] [TimelineService]: Closed Timeline Service 2023-06-19T10:27:09.195+0000 [INFO] [offline_compaction_schedule] [org.apache.hudi.client.embedded.EmbeddedTimelineService] [EmbeddedTimelineService]: Closed Timeline server 2023-06-19T10:27:09.196+0000 [WARN] [offline_compaction_schedule] [org.apache.hudi.utilities.HoodieCompactor] [HoodieCompactor]: Couldn't do schedule 2023-06-19T10:27:09.211+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.AbstractConnector] [AbstractConnector]: Stopped Spark@34dc85a{HTTP/1.1, (http/1.1)}{0.0.0.0:8090} 2023-06-19T10:27:09.238+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.ui.SparkUI] [SparkUI]: Stopped Spark web UI at http://ip-100-66-69-75.3175.aws-int.thomsonreuters.com:8090 2023-06-19T10:27:09.708+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.MapOutputTrackerMasterEndpoint] [MapOutputTrackerMasterEndpoint]: MapOutputTrackerMasterEndpoint stopped! 2023-06-19T10:27:09.749+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.SparkContext] [SparkContext]: Successfully stopped SparkContext 2023-06-19T10:27:09.751+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.util.ShutdownHookManager] [ShutdownHookManager]: Shutdown hook called 2023-06-19T10:27:09.751+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.util.ShutdownHookManager] [ShutdownHookManager]: Deleting directory /mnt/tmp/spark-94366315-0ad4-4f1a-8051-1c517b83f435 2023-06-19T10:27:09.756+0000 [INFO] [offline_compaction_schedule] [org.apache.spark.util.ShutdownHookManager] [ShutdownHookManager]: Deleting directory /mnt/tmp/spark-f72ca80c-54af-4f64-bcaa-176fe9cc27e4 Heap garbage-first heap total 507904K, used 192322K [0x00000006c0000000, 0x00000006c0100f80, 0x00000007c0000000) region size 1024K, 183 young (187392K), 38 survivors (38912K) Metaspace used 102404K, capacity 108290K, committed 108544K, reserved 1144832K class space used 13406K, capacity 14036K, committed 14080K, reserved 1048576K [hadoop@ip-100-66-69-75 a206760-PowerUser2
@ad1happy2go @soumilshah1995 I see only first time the compaction triggered, from second time it says NO instant time found. But I dont want to provide instant type as its optional and I dont want to give in my case.
Looks like needCompact function is not considering hoodie.compact.inline.trigger.strategy=NUM_COMMITS as compaction strategy. And its returning as false. and compaction is not scheduling. Please help
@xushiyan @soumilshah1995 @ad1happy2go @nsivabalan
Any update on this?
@koochiswathiTR For NUM_COMMITS - below is the code it's using to analyse if it needs compaction or not. I see your hoodie.compact.inline.max.delta.commits property is 100. it will schedule after 100 commits after last compaction only.
case NUM_COMMITS:
compactable = inlineCompactDeltaCommitMax <= latestDeltaCommitInfo.getLeft();
if (compactable) {
LOG.info(String.format("The delta commits >= %s, trigger compaction scheduler.", inlineCompactDeltaCommitMax));
}
break;
Let me know in case I misunderstood your doubt.
@ad1happy2go We do have more than 200 delta commits, Sometimes we dont see compaction is getting triggered.
I see the compaction went to inflight when will this get complete, How to complete this inprogress compaction
@ad1happy2go
Compactions fail with
java.lang.IllegalArgumentException: Earliest write inflight instant time must be later than compaction time. Earliest :[==>20230620080309158deltacommitINFLIGHT], Compaction scheduled at 20230620080355689
2023-06-20 08:03:55,711 INFO s3n.S3NativeFileSystem: Opening 's3://a206760-novusdoc-s3-dev-use1/novusdoc/.hoodie/hoodie.properties' for reading 2023-06-20 08:03:55,741 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from s3://a206760-novusdoc-s3-dev-use1/novusdoc 2023-06-20 08:03:55,741 INFO table.HoodieTableMetaClient: Loading Active commit timeline for s3://a206760-novusdoc-s3-dev-use1/novusdoc
I deleted 20230620080309158.deltacommit.inflight which and 20230620080309158.deltacommited.reqeusted and it worked. But I cant do this in production, We get evry second upserts through stream. Please help
spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1 --verbose --driver-memory 2g --executor-memory 2g --class org.apache.hudi.utilities.HoodieCompactor /usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar --table-name novusdoc --base-path s3://a206760-novusdoc-s3-dev-use1/novusdoc --mode scheduleandexecute --spark-memory 2g --hoodie-conf hoodie.metadata.enable=false --hoodie-conf hoodie.compact.inline.trigger.strategy=NUM_COMMITS --hoodie-conf hoodie.compact.inline.max.delta.commits=50
First time compaction runs From second time I get this error 2023-06-20T10:31:04.313+0000 [WARN] [offline_compaction_scheduleTest] [org.apache.hudi.utilities.HoodieCompactor] [HoodieCompactor]: Couldn't do schedule 2023-06-20T10:31:04.323+0000 [INFO] [offline_compaction_scheduleTest] [org.sparkproject.jetty.server.AbstractConnector] [AbstractConnector]: Stopped Spark@99a78d7{HTTP/1.1, (http/1.1)}{0.0.0.0:8090}
I scheduled and executed compaction, but execute command failed with Out Of Memory and compaction went to INFLIGHT in hudi timeline, how to complete this INFLIGHT compaction without bringing down the ingestion job. Can we achieve the unscheduling of INFLIGHT compaction with spark submit?
spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1 --verbose --driver-memory 6g --executor-memory 6g --class org.apache.hudi.utilities.HoodieCompactor /usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar --table-name novusdoc --base-path s3://a206760-novusdoc-s3-dev-use1/novusdoc --mode schedlueandexecute --spark-memory 6g --hoodie-conf hoodie.metadata.enable=false --hoodie-conf hoodie.compact.inline.trigger.strategy=TIME_ELAPSED --hoodie-conf hoodie.compact.inline.max.delta.seconds=3600
@ad1happy2go @soumilshah1995 @xushiyan @nsivabalan spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1 --verbose --driver-memory 8g --executor-memory 8g --class org.apache.hudi.utilities.HoodieCompactor /usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar --table-name novusdoc --base-path s3://a206760-novusdoc-s3-dev-use1/novusdoc --mode execute --spark-memory 8g --hoodie-conf hoodie.metadata.enable=false --hoodie-conf hoodie.compact.inline.trigger.strategy=TIME_ELAPSED --hoodie-conf hoodie.compact.inline.max.delta.seconds=3600
@koochiswathiTR I dont think there is something like that which unschedule the compaction.
@ad1happy2go
Compactions fail with
java.lang.IllegalArgumentException: Earliest write inflight instant time must be later than compaction time. Earliest :[==>20230620080309158deltacommitINFLIGHT], Compaction scheduled at 20230620080355689
2023-06-20 08:03:55,711 INFO s3n.S3NativeFileSystem: Opening 's3://a206760-novusdoc-s3-dev-use1/novusdoc/.hoodie/hoodie.properties' for reading 2023-06-20 08:03:55,741 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from s3://a206760-novusdoc-s3-dev-use1/novusdoc 2023-06-20 08:03:55,741 INFO table.HoodieTableMetaClient: Loading Active commit timeline for s3://a206760-novusdoc-s3-dev-use1/novusdoc
I deleted 20230620080309158.deltacommit.inflight which and 20230620080309158.deltacommited.reqeusted and it worked. But I cant do this in production, We get evry second upserts through stream. Please help
spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1 --verbose --driver-memory 2g --executor-memory 2g --class org.apache.hudi.utilities.HoodieCompactor /usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar --table-name novusdoc --base-path s3://a206760-novusdoc-s3-dev-use1/novusdoc --mode scheduleandexecute --spark-memory 2g --hoodie-conf hoodie.metadata.enable=false --hoodie-conf hoodie.compact.inline.trigger.strategy=NUM_COMMITS --hoodie-conf hoodie.compact.inline.max.delta.commits=50
Has your problem been resolved
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Hi, Im trying to schedule hudi offline compaction
Below is the spark submit
spark-submit --packages org.apache.hudi:hudi-utilities-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.11:2.4.4 --class org.apache.hudi.utilities.HoodieCompactor /usr/lib/hudi/hudi-utilities-bundle.jar --base-path s3://a206760-novusnorm-s3-ci-use1/novusnorm/ --table-name novusnorm --spark-memory 5g --mode schedule
In our hoodi table ,We didnt see any metadata files under .hoodie folder. Please help here
2023-06-15T10:40:18.976+0000 [ERROR] [offline_compaction_schedule] [org.apache.hudi.utilities.UtilHelpers] [UtilHelpers]: Compact failed org.apache.hudi.exception.HoodieException: Error fetching partition paths from metadata table at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315) at org.apache.hudi.table.action.compact.HoodieCompactor.generateCompactionPlan(HoodieCompactor.java:279) at org.apache.hudi.table.action.compact.ScheduleCompactionActionExecutor.scheduleCompaction(ScheduleCompactionActionExecutor.java:123) at org.apache.hudi.table.action.compact.ScheduleCompactionActionExecutor.execute(ScheduleCompactionActionExecutor.java:93) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.scheduleCompaction(HoodieSparkMergeOnReadTable.java:133) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1348) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService(BaseHoodieWriteClient.java:1325) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant(BaseHoodieWriteClient.java:1003) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompaction(BaseHoodieWriteClient.java:994) at org.apache.hudi.utilities.HoodieCompactor.doSchedule(HoodieCompactor.java:281) at org.apache.hudi.utilities.HoodieCompactor.lambda$compact$0(HoodieCompactor.java:194)
A clear and concise description of the problem.
To Reproduce
A clear and concise description of what you expected to happen.
Environment Description
Hudi version : 0.11.1
Spark version : 3.1.2
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :S3
Running on Docker? (yes/no) :no
Additional context
Add any other context about the problem here.
Stacktrace
[HoodieBackedTableMetadata]: Metadata table was not found at path s3://a206760-novusnorm-s3-ci-use1/novusnorm/.hoodie/metadata 2023-06-15T10:40:18.015+0000 [WARN] [offline_compaction_schedule] [org.apache.spark.scheduler.TaskSetManager] [TaskSetManager]: Lost task 0.0 in stage 0.0 (TID 0) (ip-100-66-72-199.3175.aws-int.thomsonreuters.com executor 2): java.io.IOException: unexpected exception type at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1750) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1280) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) ... 40 more Caused by: java.lang.IllegalArgumentException: Invalid lambda deserialization at org.apache.hudi.metadata.FileSystemBackedTableMetadata.$deserializeLambda$(FileSystemBackedTableMetadata.java:46) ... 50 more
2023-06-15T10:40:18.950+0000 [ERROR] [offline_compaction_schedule] [org.apache.spark.scheduler.TaskSetManager] [TaskSetManager]: Task 0 in stage 0.0 failed 4 times; aborting job 2023-06-15T10:40:18.964+0000 [INFO] [offline_compaction_schedule] [io.javalin.Javalin] [Javalin]: Stopping Javalin ... 2023-06-15T10:40:18.975+0000 [INFO] [offline_compaction_schedule] [io.javalin.Javalin] [Javalin]: Javalin has stopped 2023-06-15T10:40:18.976+0000 [ERROR] [offline_compaction_schedule] [org.apache.hudi.utilities.UtilHelpers] [UtilHelpers]: Compact failed org.apache.hudi.exception.HoodieException: Error fetching partition paths from metadata table at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315) at org.apache.hudi.table.action.compact.HoodieCompactor.generateCompactionPlan(HoodieCompactor.java:279) at org.apache.hudi.table.action.compact.ScheduleCompactionActionExecutor.scheduleCompaction(ScheduleCompactionActionExecutor.java:123) at org.apache.hudi.table.action.compact.ScheduleCompactionActionExecutor.execute(ScheduleCompactionActionExecutor.java:93) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.scheduleCompaction(HoodieSparkMergeOnReadTable.java:133) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1348) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService(BaseHoodieWriteClient.java:1325) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant(BaseHoodieWriteClient.java:1003) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompaction(BaseHoodieWriteClient.java:994) at org.apache.hudi.utilities.HoodieCompactor.doSchedule(HoodieCompactor.java:281) at org.apache.hudi.utilities.HoodieCompactor.lambda$compact$0(HoodieCompactor.java:194) at org.apache.hudi.utilities.UtilHelpers.retry(UtilHelpers.java:540) at org.apache.hudi.utilities.HoodieCompactor.compact(HoodieCompactor.java:190) at org.apache.hudi.utilities.HoodieCompactor.main(HoodieCompactor.java:176) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1000) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1089) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1098) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (ip-100-66-72-199.3175.aws-int.thomsonreuters.com executor 1): java.io.IOException: unexpected exception type at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1750) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1280) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) ... 40 more Caused by: java.lang.IllegalArgumentException: Invalid lambda deserialization at org.apache.hudi.metadata.FileSystemBackedTableMetadata.$deserializeLambda$(FileSystemBackedTableMetadata.java:46) ... 50 more
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2610) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2559) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2558) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2558) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1200) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1200) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1200) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2798) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2740) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2729) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:978) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2215) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2280) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362) at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361) at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45) at org.apache.hudi.client.common.HoodieSparkEngineContext.map(HoodieSparkEngineContext.java:103) at org.apache.hudi.metadata.FileSystemBackedTableMetadata.getAllPartitionPaths(FileSystemBackedTableMetadata.java:85) at org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:117) at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313) ... 25 more Caused by: java.io.IOException: unexpected exception type at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1750) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1280) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) ... 40 more Caused by: java.lang.IllegalArgumentException: Invalid lambda deserialization at org.apache.hudi.metadata.FileSystemBackedTableMetadata.$deserializeLambda$(FileSystemBackedTableMetadata.java:46) ... 50 more 2023-06-15T10:40:18.989+0000 [INFO] [offline_compaction_schedule] [org.sparkproject.jetty.server.AbstractConnector] [AbstractConnector]: Stopped Spark@4f186450{HTTP/1.1, (http/1.1)}{0.0.0.0:8090} Command exiting with ret '0'