Open aimeralan opened 1 year ago
cc @ad1happy2go , not sure the insert dedup param would take effect internally.
I have checked hudi table data of insert operation and bulkinsert operation and found that the lost data is not duplicate data
@aimeralan Thanks for raising this. This looks interesting as bulk_insert should never miss the data in any case. Do you mind sharing the table properties and writer configurations OR if possible the code snippet which was used to insert the data.
@ad1happy2go I tried to write another table's data to hudi using bulkinsert and the data loss occurred again. I analyzed the commit files, parquet files, and counted the amount of data in each parquet file. I found that the amount of data in some parquet files was less than that recorded in the commit file, and the program still did not report any error information. `{ "partitionToWriteStats" : { "" : [ { "fileId" : "fc206763-a3ae-478e-af6c-c7c47377304f-0", "path" : "fc206763-a3ae-478e-af6c-c7c47377304f-0_0-7-0_20230920184509547.parquet", "prevCommit" : "null", "numWrites" : 113155, "numDeletes" : 0, "numUpdateWrites" : 0, "numInserts" : 113155, "totalWriteBytes" : 16426995, "totalWriteErrors" : 5, "tempPath" : null, "partitionPath" : "", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0, "fileSizeInBytes" : 16426995, "minEventTime" : null, "maxEventTime" : null, "runtimeStats" : { "totalScanTime" : 0, "totalUpsertTime" : 0, "totalCreateTime" : 23161 } }, { "fileId" : "bdf31b3e-6025-4f1f-a471-0ed835926925-0", "path" : "bdf31b3e-6025-4f1f-a471-0ed835926925-0_1-8-0_20230920184509547.parquet", "prevCommit" : "null", "numWrites" : 113155, "numDeletes" : 0, "numUpdateWrites" : 0, "numInserts" : 113155, "totalWriteBytes" : 18268111, "totalWriteErrors" : 0, "tempPath" : null, "partitionPath" : "", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0, "fileSizeInBytes" : 18268111, "minEventTime" : null, "maxEventTime" : null, "runtimeStats" : { "totalScanTime" : 0, "totalUpsertTime" : 0, "totalCreateTime" : 23335 } }, { "fileId" : "35a34f01-c5cc-4fe9-8d3a-46a5a808fc9d-0", "path" : "35a34f01-c5cc-4fe9-8d3a-46a5a808fc9d-0_2-9-0_20230920184509547.parquet", "prevCommit" : "null", "numWrites" : 113155, "numDeletes" : 0, "numUpdateWrites" : 0, "numInserts" : 113155, "totalWriteBytes" : 17007621, "totalWriteErrors" : 4, "tempPath" : null, "partitionPath" : "", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0, "fileSizeInBytes" : 17007621, "minEventTime" : null, "maxEventTime" : null, "runtimeStats" : { "totalScanTime" : 0, "totalUpsertTime" : 0, "totalCreateTime" : 24899 } }, { "fileId" : "a5530189-6bdb-49c4-881f-9e4cc44c2588-0", "path" : "a5530189-6bdb-49c4-881f-9e4cc44c2588-0_3-10-0_20230920184509547.parquet", "prevCommit" : "null", "numWrites" : 113155, "numDeletes" : 0, "numUpdateWrites" : 0, "numInserts" : 113155, "totalWriteBytes" : 17590080, "totalWriteErrors" : 1, "tempPath" : null, "partitionPath" : "", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0, "fileSizeInBytes" : 17590080, "minEventTime" : null, "maxEventTime" : null, "runtimeStats" : { "totalScanTime" : 0, "totalUpsertTime" : 0, "totalCreateTime" : 22576 } }, { "fileId" : "0402f521-03c9-4416-b2da-364fe8ca8085-0", "path" : "0402f521-03c9-4416-b2da-364fe8ca8085-0_4-11-0_20230920184509547.parquet", "prevCommit" : "null", "numWrites" : 113155, "numDeletes" : 0, "numUpdateWrites" : 0, "numInserts" : 113155, "totalWriteBytes" : 16921603, "totalWriteErrors" : 6, "tempPath" : null, "partitionPath" : "", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0, "fileSizeInBytes" : 16921603, "minEventTime" : null, "maxEventTime" : null, "runtimeStats" : { "totalScanTime" : 0, "totalUpsertTime" : 0, "totalCreateTime" : 25065 } }, { "fileId" : "abca3619-e7ce-4e04-82ea-9276ae8b142c-0", "path" : "abca3619-e7ce-4e04-82ea-9276ae8b142c-0_5-12-0_20230920184509547.parquet", "prevCommit" : "null", "numWrites" : 71273, "numDeletes" : 0, "numUpdateWrites" : 0, "numInserts" : 71273, "totalWriteBytes" : 12315783, "totalWriteErrors" : 0, "tempPath" : null, "partitionPath" : "", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0, "fileSizeInBytes" : 12315783, "minEventTime" : null, "maxEventTime" : null, "runtimeStats" : { "totalScanTime" : 0, "totalUpsertTime" : 0, "totalCreateTime" : 21702 } } ] }, "compacted" : false, "extraMetadata" : { "schema" : "..." }, "operationType" : "BULK_INSERT" }
`
@ad1happy2go I supplemented my hudi configuration as follows
Map( hoodie.datasource.write.payload.class -> org.apache.hudi.common.model.OverwriteWithLatestAvroPayload, hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.NonpartitionedKeyGenerator, hoodie.datasource.write.partitionpath.field -> , hoodie.datasource.write.recordkey.field -> psr_id, hoodie.bulkinsert.shuffle.parallelism -> 1, hoodie.metadata.enable -> false, hoodie.clean.automatic -> true, hoodie.datasource.write.precombine.field -> ext_date_time, hoodie.table.name -> table_name, hoodie.insert.shuffle.parallelism -> 1, hoodie.datasource.write.operation -> bulk_insert )
@aimeralan Can you try reproducing on sample small dataset. Also, Are the records lost were having same record key which are there in table? Although those also should not duplicate with bulk_insert.
@ad1happy2go Hello, @aimeralan Hi, he's my colleague. Let me answer this question. We have ruled out the possibility of duplicate primary keys. According to the summary of the primary keys in the table, the number of primary keys with more than 1 aggregation times is 0.When I use the operation type: Insert, the problem of data confusion and loss does not occur.
@blackcheckren Thanks. So what you are saying is on same datasets, operation type bulk_insert is causing data loss but operation type "insert" is not causing?
@ad1happy2go yes,Yes, I tried to insert data using the bulk_insert operation type many times, and the result was a fixed number of missing data.
@ad1happy2go What other information do I need to provide in order to troubleshoot the problem?
@blackcheckren I couldn't reproduce this issue actually and not sure why that would happen actually. The configurations looks okay.
In case you still have spark event logs, can you check were there tasks/stage failures during the run which created duplicates? Or Are you getting this issue consistently when you re-ingested.
@ad1happy2go Thank you for your reply. The occurrence of this problem is repeated and stable. Here is the code I used and the log of running time, I hope it can help to repeat the problem.
package js.sgcc.com.cn.demo.maxcompute
import js.sgcc.com.cn.model.MaxComputeTableInfo
import js.sgcc.com.cn.utils.{HudiConfig, SparkHelper}
import org.apache.hudi.SparkConfigs
import org.apache.spark.sql.SaveMode
/**
* 测试spark3.1直抽maxcompute功能
*/
object MaxComputeDemo {
val ODPS_DATA_SOURCE = "org.apache.spark.sql.odps.datasource.DefaultSource"
val ODPS_ENDPOINT = ""
def main(args: Array[String]): Unit = {
val odpsProject = "js_ods_prod"
val odpsAkId = ""
val odpsAkKey = ""
val odpsTable = "ods_pms25_t_psr_ds_p_transformer"
val spark = SparkHelper.getSparkSession("dev",this.getClass.getSimpleName.stripPrefix("$"))
spark.sparkContext.setLogLevel("info")
import spark._
val df = spark.read.format(ODPS_DATA_SOURCE)
.option("spark.hadoop.odps.project.name", odpsProject)
.option("spark.hadoop.odps.access.id", odpsAkId)
.option("spark.hadoop.odps.access.key", odpsAkKey)
.option("spark.hadoop.odps.end.point", ODPS_ENDPOINT)
.option("spark.hadoop.odps.table.name", odpsTable)
.load()
println("maxcompute table row num is: " + df.count())
val tableInfo = MaxComputeTableInfo(
1,
"js_ods_prod",
"ods_pms25_t_psr_ds_p_transformer",
"normal",
"ods",
"pms25",
"s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer",
"psr_id",
"",
"ext_date_time",
6,
"cow",
"bloom"
)
df.write
.format("hudi")
.mode(SaveMode.Overwrite)
.options(HudiConfig.getHudiConfig(tableInfo))
.save(tableInfo.targetPath)
println(HudiConfig.getHudiConfig(tableInfo))
val sourceDF = spark.read
.format("hudi")
.load(tableInfo.targetPath)
println("hudi table row num: " + sourceDF.count())
}
}
D:\JDK\jdk1.8.0_201\bin\java.exe "-javaagent:D:\IDEA\IntelliJ IDEA 2021.3.3\lib\idea_rt.jar=49695:D:\IDEA\IntelliJ IDEA 2021.3.3\bin" -Dfile.encoding=UTF-8 -classpath D:\JDK\jdk1.8.0_201\jre\lib\charsets.jar;D:\JDK\jdk1.8.0_201\jre\lib\deploy.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\access-bridge-64.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\cldrdata.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\dnsns.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\jaccess.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\jfxrt.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\localedata.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\nashorn.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\sunec.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\sunjce_provider.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\sunmscapi.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\sunpkcs11.jar;D:\JDK\jdk1.8.0_201\jre\lib\ext\zipfs.jar;D:\JDK\jdk1.8.0_201\jre\lib\javaws.jar;D:\JDK\jdk1.8.0_201\jre\lib\jce.jar;D:\JDK\jdk1.8.0_201\jre\lib\jfr.jar;D:\JDK\jdk1.8.0_201\jre\lib\jfxswt.jar;D:\JDK\jdk1.8.0_201\jre\lib\jsse.jar;D:\JDK\jdk1.8.0_201\jre\lib\management-agent.jar;D:\JDK\jdk1.8.0_201\jre\lib\plugin.jar;D:\JDK\jdk1.8.0_201\jre\lib\resources.jar;D:\JDK\jdk1.8.0_201\jre\lib\rt.jar;H:\idea_workspace\data-export-hudi\target\classes;H:\repository\com\alibaba\fastjson\1.2.45\fastjson-1.2.45.jar;H:\repository\org\scala-lang\scala-library\2.12.10\scala-library-2.12.10.jar;H:\repository\org\apache\spark\spark-sql_2.12\3.1.3\spark-sql_2.12-3.1.3.jar;H:\repository\com\univocity\univocity-parsers\2.9.1\univocity-parsers-2.9.1.jar;H:\repository\org\apache\spark\spark-sketch_2.12\3.1.3\spark-sketch_2.12-3.1.3.jar;H:\repository\org\apache\spark\spark-catalyst_2.12\3.1.3\spark-catalyst_2.12-3.1.3.jar;H:\repository\org\scala-lang\modules\scala-parser-combinators_2.12\1.1.2\scala-parser-combinators_2.12-1.1.2.jar;H:\repository\org\codehaus\janino\janino\3.0.16\janino-3.0.16.jar;H:\repository\org\codehaus\janino\commons-compiler\3.0.16\commons-compiler-3.0.16.jar;H:\repository\org\antlr\antlr4-runtime\4.8-1\antlr4-runtime-4.8-1.jar;H:\repository\org\apache\arrow\arrow-vector\2.0.0\arrow-vector-2.0.0.jar;H:\repository\org\apache\arrow\arrow-format\2.0.0\arrow-format-2.0.0.jar;H:\repository\org\apache\arrow\arrow-memory-core\2.0.0\arrow-memory-core-2.0.0.jar;H:\repository\com\google\flatbuffers\flatbuffers-java\1.9.0\flatbuffers-java-1.9.0.jar;H:\repository\org\apache\arrow\arrow-memory-netty\2.0.0\arrow-memory-netty-2.0.0.jar;H:\repository\org\apache\spark\spark-tags_2.12\3.1.3\spark-tags_2.12-3.1.3.jar;H:\repository\org\apache\orc\orc-core\1.5.13\orc-core-1.5.13.jar;H:\repository\org\apache\orc\orc-shims\1.5.13\orc-shims-1.5.13.jar;H:\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;H:\repository\io\airlift\aircompressor\0.10\aircompressor-0.10.jar;H:\repository\org\threeten\threeten-extra\1.5.0\threeten-extra-1.5.0.jar;H:\repository\org\apache\orc\orc-mapreduce\1.5.13\orc-mapreduce-1.5.13.jar;H:\repository\org\apache\hive\hive-storage-api\2.7.2\hive-storage-api-2.7.2.jar;H:\repository\org\apache\parquet\parquet-column\1.10.1\parquet-column-1.10.1.jar;H:\repository\org\apache\parquet\parquet-common\1.10.1\parquet-common-1.10.1.jar;H:\repository\org\apache\parquet\parquet-encoding\1.10.1\parquet-encoding-1.10.1.jar;H:\repository\org\apache\parquet\parquet-hadoop\1.10.1\parquet-hadoop-1.10.1.jar;H:\repository\org\apache\parquet\parquet-format\2.4.0\parquet-format-2.4.0.jar;H:\repository\org\apache\parquet\parquet-jackson\1.10.1\parquet-jackson-1.10.1.jar;H:\repository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;H:\repository\org\apache\xbean\xbean-asm7-shaded\4.15\xbean-asm7-shaded-4.15.jar;H:\repository\org\spark-project\spark\unused\1.0.0\unused-1.0.0.jar;H:\repository\org\apache\spark\spark-sql-kafka-0-10_2.12\3.1.3\spark-sql-kafka-0-10_2.12-3.1.3.jar;H:\repository\org\apache\spark\spark-token-provider-kafka-0-10_2.12\3.1.3\spark-token-provider-kafka-0-10_2.12-3.1.3.jar;H:\repository\org\apache\commons\commons-pool2\2.6.2\commons-pool2-2.6.2.jar;H:\repository\org\apache\spark\spark-core_2.12\3.1.3\spark-core_2.12-3.1.3.jar;H:\repository\com\thoughtworks\paranamer\paranamer\2.8\paranamer-2.8.jar;H:\repository\org\apache\avro\avro\1.8.2\avro-1.8.2.jar;H:\repository\org\tukaani\xz\1.5\xz-1.5.jar;H:\repository\org\apache\avro\avro-mapred\1.8.2\avro-mapred-1.8.2-hadoop2.jar;H:\repository\org\apache\avro\avro-ipc\1.8.2\avro-ipc-1.8.2.jar;H:\repository\com\twitter\chill_2.12\0.9.5\chill_2.12-0.9.5.jar;H:\repository\com\esotericsoftware\kryo-shaded\4.0.2\kryo-shaded-4.0.2.jar;H:\repository\com\esotericsoftware\minlog\1.3.0\minlog-1.3.0.jar;H:\repository\org\objenesis\objenesis\2.5.1\objenesis-2.5.1.jar;H:\repository\com\twitter\chill-java\0.9.5\chill-java-0.9.5.jar;H:\repository\org\apache\spark\spark-launcher_2.12\3.1.3\spark-launcher_2.12-3.1.3.jar;H:\repository\org\apache\spark\spark-kvstore_2.12\3.1.3\spark-kvstore_2.12-3.1.3.jar;H:\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;H:\repository\org\apache\spark\spark-network-common_2.12\3.1.3\spark-network-common_2.12-3.1.3.jar;H:\repository\com\google\crypto\tink\tink\1.6.0\tink-1.6.0.jar;H:\repository\org\apache\spark\spark-network-shuffle_2.12\3.1.3\spark-network-shuffle_2.12-3.1.3.jar;H:\repository\org\apache\spark\spark-unsafe_2.12\3.1.3\spark-unsafe_2.12-3.1.3.jar;H:\repository\javax\activation\activation\1.1.1\activation-1.1.1.jar;H:\repository\org\apache\curator\curator-recipes\2.13.0\curator-recipes-2.13.0.jar;H:\repository\jakarta\servlet\jakarta.servlet-api\4.0.3\jakarta.servlet-api-4.0.3.jar;H:\repository\org\apache\commons\commons-lang3\3.10\commons-lang3-3.10.jar;H:\repository\org\apache\commons\commons-math3\3.4.1\commons-math3-3.4.1.jar;H:\repository\org\apache\commons\commons-text\1.6\commons-text-1.6.jar;H:\repository\com\google\code\findbugs\jsr305\3.0.0\jsr305-3.0.0.jar;H:\repository\org\slf4j\slf4j-api\1.7.30\slf4j-api-1.7.30.jar;H:\repository\org\slf4j\jul-to-slf4j\1.7.30\jul-to-slf4j-1.7.30.jar;H:\repository\org\slf4j\jcl-over-slf4j\1.7.30\jcl-over-slf4j-1.7.30.jar;H:\repository\org\slf4j\slf4j-log4j12\1.7.30\slf4j-log4j12-1.7.30.jar;H:\repository\com\ning\compress-lzf\1.0.3\compress-lzf-1.0.3.jar;H:\repository\org\xerial\snappy\snappy-java\1.1.8.2\snappy-java-1.1.8.2.jar;H:\repository\org\lz4\lz4-java\1.7.1\lz4-java-1.7.1.jar;H:\repository\com\github\luben\zstd-jni\1.4.8-1\zstd-jni-1.4.8-1.jar;H:\repository\org\roaringbitmap\RoaringBitmap\0.9.0\RoaringBitmap-0.9.0.jar;H:\repository\org\roaringbitmap\shims\0.9.0\shims-0.9.0.jar;H:\repository\commons-net\commons-net\3.1\commons-net-3.1.jar;H:\repository\org\scala-lang\modules\scala-xml_2.12\1.2.0\scala-xml_2.12-1.2.0.jar;H:\repository\org\scala-lang\scala-reflect\2.12.10\scala-reflect-2.12.10.jar;H:\repository\org\json4s\json4s-jackson_2.12\3.7.0-M5\json4s-jackson_2.12-3.7.0-M5.jar;H:\repository\org\json4s\json4s-core_2.12\3.7.0-M5\json4s-core_2.12-3.7.0-M5.jar;H:\repository\org\json4s\json4s-ast_2.12\3.7.0-M5\json4s-ast_2.12-3.7.0-M5.jar;H:\repository\org\json4s\json4s-scalap_2.12\3.7.0-M5\json4s-scalap_2.12-3.7.0-M5.jar;H:\repository\org\glassfish\jersey\core\jersey-client\2.30\jersey-client-2.30.jar;H:\repository\jakarta\ws\rs\jakarta.ws.rs-api\2.1.6\jakarta.ws.rs-api-2.1.6.jar;H:\repository\org\glassfish\hk2\external\jakarta.inject\2.6.1\jakarta.inject-2.6.1.jar;H:\repository\org\glassfish\jersey\core\jersey-common\2.30\jersey-common-2.30.jar;H:\repository\jakarta\annotation\jakarta.annotation-api\1.3.5\jakarta.annotation-api-1.3.5.jar;H:\repository\org\glassfish\hk2\osgi-resource-locator\1.0.3\osgi-resource-locator-1.0.3.jar;H:\repository\org\glassfish\jersey\core\jersey-server\2.30\jersey-server-2.30.jar;H:\repository\org\glassfish\jersey\media\jersey-media-jaxb\2.30\jersey-media-jaxb-2.30.jar;H:\repository\jakarta\validation\jakarta.validation-api\2.0.2\jakarta.validation-api-2.0.2.jar;H:\repository\org\glassfish\jersey\containers\jersey-container-servlet\2.30\jersey-container-servlet-2.30.jar;H:\repository\org\glassfish\jersey\containers\jersey-container-servlet-core\2.30\jersey-container-servlet-core-2.30.jar;H:\repository\org\glassfish\jersey\inject\jersey-hk2\2.30\jersey-hk2-2.30.jar;H:\repository\org\glassfish\hk2\hk2-locator\2.6.1\hk2-locator-2.6.1.jar;H:\repository\org\glassfish\hk2\external\aopalliance-repackaged\2.6.1\aopalliance-repackaged-2.6.1.jar;H:\repository\org\glassfish\hk2\hk2-api\2.6.1\hk2-api-2.6.1.jar;H:\repository\org\glassfish\hk2\hk2-utils\2.6.1\hk2-utils-2.6.1.jar;H:\repository\org\javassist\javassist\3.25.0-GA\javassist-3.25.0-GA.jar;H:\repository\io\netty\netty-all\4.1.51.Final\netty-all-4.1.51.Final.jar;H:\repository\com\clearspring\analytics\stream\2.9.6\stream-2.9.6.jar;H:\repository\io\dropwizard\metrics\metrics-core\4.1.1\metrics-core-4.1.1.jar;H:\repository\io\dropwizard\metrics\metrics-jvm\4.1.1\metrics-jvm-4.1.1.jar;H:\repository\io\dropwizard\metrics\metrics-json\4.1.1\metrics-json-4.1.1.jar;H:\repository\io\dropwizard\metrics\metrics-graphite\4.1.1\metrics-graphite-4.1.1.jar;H:\repository\io\dropwizard\metrics\metrics-jmx\4.1.1\metrics-jmx-4.1.1.jar;H:\repository\org\apache\ivy\ivy\2.4.0\ivy-2.4.0.jar;H:\repository\oro\oro\2.0.8\oro-2.0.8.jar;H:\repository\net\razorvine\pyrolite\4.30\pyrolite-4.30.jar;H:\repository\net\sf\py4j\py4j\0.10.9\py4j-0.10.9.jar;H:\repository\org\apache\commons\commons-crypto\1.1.0\commons-crypto-1.1.0.jar;H:\repository\org\apache\hudi\hudi-spark3.1-bundle_2.12\0.13.1\hudi-spark3.1-bundle_2.12-0.13.1.jar;H:\repository\net\minidev\json-smart\2.3\json-smart-2.3.jar;H:\repository\net\minidev\accessors-smart\1.2\accessors-smart-1.2.jar;H:\repository\org\ow2\asm\asm\5.0.4\asm-5.0.4.jar;H:\repository\org\glassfish\javax.el\3.0.1-b12\javax.el-3.0.1-b12.jar;H:\repository\org\apache\httpcomponents\httpcore\4.4.15\httpcore-4.4.15.jar;H:\repository\com\fasterxml\jackson\core\jackson-core\2.12.3\jackson-core-2.12.3.jar;H:\repository\com\fasterxml\jackson\core\jackson-databind\2.12.3\jackson-databind-2.12.3.jar;H:\repository\com\fasterxml\jackson\core\jackson-annotations\2.12.3\jackson-annotations-2.12.3.jar;H:\repository\com\fasterxml\jackson\module\jackson-module-scala_2.12\2.12.3\jackson-module-scala_2.12-2.12.3.jar;H:\repository\mysql\mysql-connector-java\5.1.49\mysql-connector-java-5.1.49.jar;H:\repository\org\apache\hadoop\hadoop-client\3.2.0\hadoop-client-3.2.0.jar;H:\repository\org\apache\hadoop\hadoop-hdfs-client\3.2.0\hadoop-hdfs-client-3.2.0.jar;H:\repository\com\squareup\okhttp\okhttp\2.7.5\okhttp-2.7.5.jar;H:\repository\com\squareup\okio\okio\1.6.0\okio-1.6.0.jar;H:\repository\org\apache\hadoop\hadoop-yarn-api\3.2.0\hadoop-yarn-api-3.2.0.jar;H:\repository\javax\xml\bind\jaxb-api\2.2.11\jaxb-api-2.2.11.jar;H:\repository\org\apache\hadoop\hadoop-yarn-client\3.2.0\hadoop-yarn-client-3.2.0.jar;H:\repository\org\apache\hadoop\hadoop-mapreduce-client-core\3.2.0\hadoop-mapreduce-client-core-3.2.0.jar;H:\repository\org\apache\hadoop\hadoop-yarn-common\3.2.0\hadoop-yarn-common-3.2.0.jar;H:\repository\com\sun\jersey\jersey-client\1.19\jersey-client-1.19.jar;H:\repository\com\fasterxml\jackson\module\jackson-module-jaxb-annotations\2.9.5\jackson-module-jaxb-annotations-2.9.5.jar;H:\repository\com\fasterxml\jackson\jaxrs\jackson-jaxrs-json-provider\2.9.5\jackson-jaxrs-json-provider-2.9.5.jar;H:\repository\com\fasterxml\jackson\jaxrs\jackson-jaxrs-base\2.9.5\jackson-jaxrs-base-2.9.5.jar;H:\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\3.2.0\hadoop-mapreduce-client-jobclient-3.2.0.jar;H:\repository\org\apache\hadoop\hadoop-mapreduce-client-common\3.2.0\hadoop-mapreduce-client-common-3.2.0.jar;H:\repository\org\apache\hadoop\hadoop-annotations\3.2.0\hadoop-annotations-3.2.0.jar;H:\repository\org\apache\hadoop\hadoop-aws\3.2.0\hadoop-aws-3.2.0.jar;H:\repository\com\amazonaws\aws-java-sdk-bundle\1.11.375\aws-java-sdk-bundle-1.11.375.jar;H:\repository\org\apache\httpcomponents\httpclient\4.5.12\httpclient-4.5.12.jar;H:\repository\commons-logging\commons-logging\1.2\commons-logging-1.2.jar;H:\repository\commons-codec\commons-codec\1.11\commons-codec-1.11.jar;H:\repository\org\apache\kafka\kafka-clients\2.4.1\kafka-clients-2.4.1.jar;H:\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar;H:\repository\com\github\scopt\scopt_2.12\4.0.0-RC2\scopt_2.12-4.0.0-RC2.jar;H:\repository\net\heartsavior\spark\spark-sql-kafka-offset-committer_2.12\0.4.0-spark-3.0\spark-sql-kafka-offset-committer_2.12-0.4.0-spark-3.0.jar;H:\repository\org\apache\spark\spark-hive_2.12\3.1.3\spark-hive_2.12-3.1.3.jar;H:\repository\org\apache\hive\hive-exec\2.3.7\hive-exec-2.3.7-core.jar;H:\repository\org\apache\hive\hive-vector-code-gen\2.3.7\hive-vector-code-gen-2.3.7.jar;H:\repository\org\apache\velocity\velocity\1.5\velocity-1.5.jar;H:\repository\org\antlr\antlr-runtime\3.5.2\antlr-runtime-3.5.2.jar;H:\repository\org\antlr\ST4\4.0.4\ST4-4.0.4.jar;H:\repository\stax\stax-api\1.0.1\stax-api-1.0.1.jar;H:\repository\org\apache\hive\hive-metastore\2.3.7\hive-metastore-2.3.7.jar;H:\repository\javolution\javolution\5.5.1\javolution-5.5.1.jar;H:\repository\com\jolbox\bonecp\0.8.0.RELEASE\bonecp-0.8.0.RELEASE.jar;H:\repository\com\zaxxer\HikariCP\2.5.1\HikariCP-2.5.1.jar;H:\repository\org\datanucleus\datanucleus-api-jdo\4.2.4\datanucleus-api-jdo-4.2.4.jar;H:\repository\org\datanucleus\datanucleus-rdbms\4.1.19\datanucleus-rdbms-4.1.19.jar;H:\repository\commons-pool\commons-pool\1.5.4\commons-pool-1.5.4.jar;H:\repository\commons-dbcp\commons-dbcp\1.4\commons-dbcp-1.4.jar;H:\repository\javax\jdo\jdo-api\3.0.1\jdo-api-3.0.1.jar;H:\repository\javax\transaction\jta\1.1\jta-1.1.jar;H:\repository\org\datanucleus\javax.jdo\3.2.0-m3\javax.jdo-3.2.0-m3.jar;H:\repository\javax\transaction\transaction-api\1.1\transaction-api-1.1.jar;H:\repository\org\apache\hive\hive-serde\2.3.7\hive-serde-2.3.7.jar;H:\repository\net\sf\opencsv\opencsv\2.3\opencsv-2.3.jar;H:\repository\org\apache\hive\hive-shims\2.3.7\hive-shims-2.3.7.jar;H:\repository\org\apache\hive\shims\hive-shims-common\2.3.7\hive-shims-common-2.3.7.jar;H:\repository\org\apache\hive\shims\hive-shims-0.23\2.3.7\hive-shims-0.23-2.3.7.jar;H:\repository\org\apache\hive\shims\hive-shims-scheduler\2.3.7\hive-shims-scheduler-2.3.7.jar;H:\repository\org\apache\hive\hive-llap-common\2.3.7\hive-llap-common-2.3.7.jar;H:\repository\org\apache\hive\hive-llap-client\2.3.7\hive-llap-client-2.3.7.jar;H:\repository\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;H:\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;H:\repository\joda-time\joda-time\2.10.5\joda-time-2.10.5.jar;H:\repository\org\jodd\jodd-core\3.5.2\jodd-core-3.5.2.jar;H:\repository\org\datanucleus\datanucleus-core\4.1.17\datanucleus-core-4.1.17.jar;H:\repository\org\apache\thrift\libthrift\0.12.0\libthrift-0.12.0.jar;H:\repository\org\apache\thrift\libfb303\0.9.3\libfb303-0.9.3.jar;H:\repository\org\apache\derby\derby\10.12.1.1\derby-10.12.1.1.jar;H:\repository\org\apache\hive\hive-jdbc\3.1.2\hive-jdbc-3.1.2.jar;H:\repository\org\apache\hive\hive-common\3.1.2\hive-common-3.1.2.jar;H:\repository\org\apache\hive\hive-classification\3.1.2\hive-classification-3.1.2.jar;H:\repository\org\eclipse\jetty\jetty-http\9.3.20.v20170531\jetty-http-9.3.20.v20170531.jar;H:\repository\org\eclipse\jetty\jetty-rewrite\9.3.20.v20170531\jetty-rewrite-9.3.20.v20170531.jar;H:\repository\org\eclipse\jetty\jetty-client\9.3.20.v20170531\jetty-client-9.3.20.v20170531.jar;H:\repository\org\apache\logging\log4j\log4j-1.2-api\2.10.0\log4j-1.2-api-2.10.0.jar;H:\repository\org\apache\logging\log4j\log4j-api\2.10.0\log4j-api-2.10.0.jar;H:\repository\org\apache\logging\log4j\log4j-core\2.10.0\log4j-core-2.10.0.jar;H:\repository\org\apache\logging\log4j\log4j-web\2.10.0\log4j-web-2.10.0.jar;H:\repository\org\apache\logging\log4j\log4j-slf4j-impl\2.10.0\log4j-slf4j-impl-2.10.0.jar;H:\repository\org\apache\ant\ant\1.9.1\ant-1.9.1.jar;H:\repository\org\apache\ant\ant-launcher\1.9.1\ant-launcher-1.9.1.jar;H:\repository\net\sf\jpam\jpam\1.1\jpam-1.1.jar;H:\repository\com\tdunning\json\1.8\json-1.8.jar;H:\repository\com\github\joshelser\dropwizard-metrics-hadoop-metrics2-reporter\0.1.2\dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar;H:\repository\org\apache\hive\hive-service\3.1.2\hive-service-3.1.2.jar;H:\repository\org\apache\hive\hive-llap-server\3.1.2\hive-llap-server-3.1.2.jar;H:\repository\org\apache\hive\hive-llap-tez\3.1.2\hive-llap-tez-3.1.2.jar;H:\repository\org\apache\hive\hive-llap-common\3.1.2\hive-llap-common-3.1.2-tests.jar;H:\repository\org\apache\hbase\hbase-hadoop2-compat\2.0.0-alpha4\hbase-hadoop2-compat-2.0.0-alpha4.jar;H:\repository\org\apache\hbase\hbase-metrics\2.0.0-alpha4\hbase-metrics-2.0.0-alpha4.jar;H:\repository\org\apache\hbase\hbase-metrics-api\2.0.0-alpha4\hbase-metrics-api-2.0.0-alpha4.jar;H:\repository\org\apache\hbase\thirdparty\hbase-shaded-miscellaneous\1.0.1\hbase-shaded-miscellaneous-1.0.1.jar;H:\repository\org\apache\yetus\audience-annotations\0.5.0\audience-annotations-0.5.0.jar;H:\repository\org\apache\hbase\hbase-client\2.0.0-alpha4\hbase-client-2.0.0-alpha4.jar;H:\repository\org\apache\hbase\thirdparty\hbase-shaded-protobuf\1.0.1\hbase-shaded-protobuf-1.0.1.jar;H:\repository\org\apache\hbase\hbase-protocol-shaded\2.0.0-alpha4\hbase-protocol-shaded-2.0.0-alpha4.jar;H:\repository\org\apache\hbase\hbase-protocol\2.0.0-alpha4\hbase-protocol-2.0.0-alpha4.jar;H:\repository\org\apache\hbase\thirdparty\hbase-shaded-netty\1.0.1\hbase-shaded-netty-1.0.1.jar;H:\repository\org\apache\htrace\htrace-core\3.2.0-incubating\htrace-core-3.2.0-incubating.jar;H:\repository\org\jruby\jcodings\jcodings\1.0.18\jcodings-1.0.18.jar;H:\repository\org\jruby\joni\joni\2.1.11\joni-2.1.11.jar;H:\repository\org\apache\hbase\hbase-server\2.0.0-alpha4\hbase-server-2.0.0-alpha4.jar;H:\repository\org\apache\hbase\hbase-http\2.0.0-alpha4\hbase-http-2.0.0-alpha4.jar;H:\repository\org\eclipse\jetty\jetty-util-ajax\9.4.6.v20170531\jetty-util-ajax-9.4.6.v20170531.jar;H:\repository\org\apache\hbase\hbase-procedure\2.0.0-alpha4\hbase-procedure-2.0.0-alpha4.jar;H:\repository\org\apache\hbase\hbase-common\2.0.0-alpha4\hbase-common-2.0.0-alpha4-tests.jar;H:\repository\org\apache\hbase\hbase-replication\2.0.0-alpha4\hbase-replication-2.0.0-alpha4.jar;H:\repository\org\apache\hbase\hbase-prefix-tree\2.0.0-alpha4\hbase-prefix-tree-2.0.0-alpha4.jar;H:\repository\org\apache\commons\commons-collections4\4.1\commons-collections4-4.1.jar;H:\repository\org\glassfish\web\javax.servlet.jsp\2.3.2\javax.servlet.jsp-2.3.2.jar;H:\repository\javax\ws\rs\javax.ws.rs-api\2.0.1\javax.ws.rs-api-2.0.1.jar;H:\repository\com\lmax\disruptor\3.3.6\disruptor-3.3.6.jar;H:\repository\org\apache\hadoop\hadoop-distcp\2.7.1\hadoop-distcp-2.7.1.jar;H:\repository\org\apache\hadoop\hadoop-hdfs\2.7.1\hadoop-hdfs-2.7.1.jar;H:\repository\org\mortbay\jetty\jetty\6.1.26\jetty-6.1.26.jar;H:\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;H:\repository\commons-daemon\commons-daemon\1.0.13\commons-daemon-1.0.13.jar;H:\repository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;H:\repository\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;H:\repository\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;H:\repository\org\apache\hbase\hbase-mapreduce\2.0.0-alpha4\hbase-mapreduce-2.0.0-alpha4.jar;H:\repository\org\apache\hbase\hbase-common\2.0.0-alpha4\hbase-common-2.0.0-alpha4.jar;H:\repository\com\github\stephenc\findbugs\findbugs-annotations\1.3.9-1\findbugs-annotations-1.3.9-1.jar;H:\repository\org\apache\hbase\hbase-hadoop-compat\2.0.0-alpha4\hbase-hadoop-compat-2.0.0-alpha4.jar;H:\repository\javax\servlet\jsp\javax.servlet.jsp-api\2.3.1\javax.servlet.jsp-api-2.3.1.jar;H:\repository\org\jamon\jamon-runtime\2.3.1\jamon-runtime-2.3.1.jar;H:\repository\org\apache\hive\hive-service-rpc\3.1.2\hive-service-rpc-3.1.2.jar;H:\repository\org\apache\curator\curator-framework\2.12.0\curator-framework-2.12.0.jar;H:\repository\org\apache\hive\hive-upgrade-acid\3.1.2\hive-upgrade-acid-3.1.2.jar;H:\repository\js\sgcc\com\cn\spark-datasource-v3.1-maxcompute\1.0.0-release\spark-datasource-v3.1-maxcompute-1.0.0-release.jar;H:\repository\com\aliyun\odps\odps-sdk-core\0.37.10-public\odps-sdk-core-0.37.10-public.jar;H:\repository\com\aliyun\odps\odps-sdk-commons\0.37.10-public\odps-sdk-commons-0.37.10-public.jar;H:\repository\org\aspectj\aspectjrt\1.8.9\aspectjrt-1.8.9.jar;H:\repository\net\sourceforge\javacsv\javacsv\2.0\javacsv-2.0.jar;H:\repository\javax\mail\mail\1.4.7\mail-1.4.7.jar;H:\repository\stax\stax\1.2.0\stax-1.2.0.jar;H:\repository\xpp3\xpp3\1.1.3.3\xpp3-1.1.3.3.jar;H:\repository\com\aliyun\odps\table-api-tunnel-impl\1.1.5-SNAPSHOT\table-api-tunnel-impl-1.1.5-SNAPSHOT.jar;H:\repository\com\aliyun\odps\cupid\cupid-table-api\1.1.5-SNAPSHOT\cupid-table-api-1.1.5-SNAPSHOT.jar;H:\repository\com\aliyun\emr\emr-datasources_shaded_2.12\3.0.2\emr-datasources_shaded_2.12-3.0.2.jar;H:\repository\org\apache\hadoop\hadoop-common\3.2.0\hadoop-common-3.2.0.jar;H:\repository\com\google\guava\guava\11.0.2\guava-11.0.2.jar;H:\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;H:\repository\commons-io\commons-io\2.5\commons-io-2.5.jar;H:\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;H:\repository\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;H:\repository\org\eclipse\jetty\jetty-server\9.3.24.v20180605\jetty-server-9.3.24.v20180605.jar;H:\repository\org\eclipse\jetty\jetty-io\9.3.24.v20180605\jetty-io-9.3.24.v20180605.jar;H:\repository\org\eclipse\jetty\jetty-util\9.3.24.v20180605\jetty-util-9.3.24.v20180605.jar;H:\repository\org\eclipse\jetty\jetty-servlet\9.3.24.v20180605\jetty-servlet-9.3.24.v20180605.jar;H:\repository\org\eclipse\jetty\jetty-security\9.3.24.v20180605\jetty-security-9.3.24.v20180605.jar;H:\repository\org\eclipse\jetty\jetty-webapp\9.3.24.v20180605\jetty-webapp-9.3.24.v20180605.jar;H:\repository\org\eclipse\jetty\jetty-xml\9.3.24.v20180605\jetty-xml-9.3.24.v20180605.jar;H:\repository\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;H:\repository\com\sun\jersey\jersey-core\1.19\jersey-core-1.19.jar;H:\repository\javax\ws\rs\jsr311-api\1.1.1\jsr311-api-1.1.1.jar;H:\repository\com\sun\jersey\jersey-servlet\1.19\jersey-servlet-1.19.jar;H:\repository\com\sun\jersey\jersey-json\1.19\jersey-json-1.19.jar;H:\repository\org\codehaus\jettison\jettison\1.1\jettison-1.1.jar;H:\repository\com\sun\xml\bind\jaxb-impl\2.2.3-1\jaxb-impl-2.2.3-1.jar;H:\repository\org\codehaus\jackson\jackson-jaxrs\1.9.2\jackson-jaxrs-1.9.2.jar;H:\repository\org\codehaus\jackson\jackson-xc\1.9.2\jackson-xc-1.9.2.jar;H:\repository\com\sun\jersey\jersey-server\1.19\jersey-server-1.19.jar;H:\repository\commons-beanutils\commons-beanutils\1.9.3\commons-beanutils-1.9.3.jar;H:\repository\org\apache\commons\commons-configuration2\2.1.1\commons-configuration2-2.1.1.jar;H:\repository\com\google\re2j\re2j\1.1\re2j-1.1.jar;H:\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;H:\repository\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;H:\repository\org\apache\hadoop\hadoop-auth\3.2.0\hadoop-auth-3.2.0.jar;H:\repository\com\nimbusds\nimbus-jose-jwt\4.41.1\nimbus-jose-jwt-4.41.1.jar;H:\repository\com\github\stephenc\jcip\jcip-annotations\1.0-1\jcip-annotations-1.0-1.jar;H:\repository\com\jcraft\jsch\0.1.54\jsch-0.1.54.jar;H:\repository\org\apache\curator\curator-client\2.12.0\curator-client-2.12.0.jar;H:\repository\org\apache\htrace\htrace-core4\4.1.0-incubating\htrace-core4-4.1.0-incubating.jar;H:\repository\org\apache\commons\commons-compress\1.4.1\commons-compress-1.4.1.jar;H:\repository\org\apache\kerby\kerb-simplekdc\1.0.1\kerb-simplekdc-1.0.1.jar;H:\repository\org\apache\kerby\kerb-client\1.0.1\kerb-client-1.0.1.jar;H:\repository\org\apache\kerby\kerby-config\1.0.1\kerby-config-1.0.1.jar;H:\repository\org\apache\kerby\kerb-core\1.0.1\kerb-core-1.0.1.jar;H:\repository\org\apache\kerby\kerby-pkix\1.0.1\kerby-pkix-1.0.1.jar;H:\repository\org\apache\kerby\kerby-asn1\1.0.1\kerby-asn1-1.0.1.jar;H:\repository\org\apache\kerby\kerby-util\1.0.1\kerby-util-1.0.1.jar;H:\repository\org\apache\kerby\kerb-common\1.0.1\kerb-common-1.0.1.jar;H:\repository\org\apache\kerby\kerb-crypto\1.0.1\kerb-crypto-1.0.1.jar;H:\repository\org\apache\kerby\kerb-util\1.0.1\kerb-util-1.0.1.jar;H:\repository\org\apache\kerby\token-provider\1.0.1\token-provider-1.0.1.jar;H:\repository\org\apache\kerby\kerb-admin\1.0.1\kerb-admin-1.0.1.jar;H:\repository\org\apache\kerby\kerb-server\1.0.1\kerb-server-1.0.1.jar;H:\repository\org\apache\kerby\kerb-identity\1.0.1\kerb-identity-1.0.1.jar;H:\repository\org\apache\kerby\kerby-xdr\1.0.1\kerby-xdr-1.0.1.jar;H:\repository\org\codehaus\woodstox\stax2-api\3.1.4\stax2-api-3.1.4.jar;H:\repository\com\fasterxml\woodstox\woodstox-core\5.0.3\woodstox-core-5.0.3.jar;H:\repository\dnsjava\dnsjava\2.1.7\dnsjava-2.1.7.jar;H:\repository\org\jdom\jdom\1.1\jdom-1.1.jar;H:\repository\com\oracle\ojdbc6\11.2.0.3\ojdbc6-11.2.0.3.jar;H:\repository\oracle\sdoapi\11.2.0\sdoapi-11.2.0.jar;H:\repository\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;H:\repository\jline\jline\0.9.94\jline-0.9.94.jar;H:\repository\junit\junit\3.8.1\junit-3.8.1.jar;H:\repository\io\netty\netty\3.7.0.Final\netty-3.7.0.Final.jar;H:\repository\com\aliyun\odps\odps-jdbc\3.2.20\odps-jdbc-3.2.20.jar;H:\repository\com\clickhouse\clickhouse-jdbc\0.3.2-patch11\clickhouse-jdbc-0.3.2-patch11-patch11.jar js.sgcc.com.cn.demo.maxcompute.MaxComputeDemo SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/H:/repository/org/slf4j/slf4j-log4j12/1.7.30/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/H:/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.10.0/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 23/10/12 20:52:47 INFO SparkContext: Running Spark version 3.1.3 23/10/12 20:52:47 INFO ResourceUtils: ============================================================== 23/10/12 20:52:47 INFO ResourceUtils: No custom resources configured for spark.driver. 23/10/12 20:52:47 INFO ResourceUtils: ============================================================== 23/10/12 20:52:47 INFO SparkContext: Submitted application: MaxComputeDemo$ 23/10/12 20:52:47 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 23/10/12 20:52:47 INFO ResourceProfile: Limiting resource is cpu 23/10/12 20:52:47 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/10/12 20:52:47 INFO SecurityManager: Changing view acls to: Administrator 23/10/12 20:52:47 INFO SecurityManager: Changing modify acls to: Administrator 23/10/12 20:52:47 INFO SecurityManager: Changing view acls groups to: 23/10/12 20:52:47 INFO SecurityManager: Changing modify acls groups to: 23/10/12 20:52:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); groups with view permissions: Set(); users with modify permissions: Set(Administrator); groups with modify permissions: Set() 23/10/12 20:52:48 INFO Utils: Successfully started service 'sparkDriver' on port 49728. 23/10/12 20:52:48 INFO SparkEnv: Registering MapOutputTracker 23/10/12 20:52:48 INFO SparkEnv: Registering BlockManagerMaster 23/10/12 20:52:48 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/10/12 20:52:48 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/10/12 20:52:48 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/10/12 20:52:48 INFO DiskBlockManager: Created local directory at C:\Users\Administrator\AppData\Local\Temp\blockmgr-0d99c97e-1fc1-4191-90d7-6bb4c6abe91a 23/10/12 20:52:48 INFO MemoryStore: MemoryStore started with capacity 891.0 MiB 23/10/12 20:52:48 INFO SparkEnv: Registering OutputCommitCoordinator 23/10/12 20:52:49 INFO Utils: Successfully started service 'SparkUI' on port 4040. 23/10/12 20:52:49 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://O5Fguyingc02:4040 23/10/12 20:52:49 INFO Executor: Starting executor ID driver on host O5Fguyingc02 23/10/12 20:52:49 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49761. 23/10/12 20:52:49 INFO NettyBlockTransferService: Server created on O5Fguyingc02:49761 23/10/12 20:52:49 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 23/10/12 20:52:49 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, O5Fguyingc02, 49761, None) 23/10/12 20:52:49 INFO BlockManagerMasterEndpoint: Registering block manager O5Fguyingc02:49761 with 891.0 MiB RAM, BlockManagerId(driver, O5Fguyingc02, 49761, None) 23/10/12 20:52:49 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, O5Fguyingc02, 49761, None) 23/10/12 20:52:49 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, O5Fguyingc02, 49761, None) 23/10/12 20:52:50 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/H:/idea_workspace/data-export-hudi/spark-warehouse/'). 23/10/12 20:52:50 INFO SharedState: Warehouse path is 'file:/H:/idea_workspace/data-export-hudi/spark-warehouse/'. splitSize, tableSizeInMB: 81, defaultParallelism: 6, parallelism: 6, splitSize: 14 23/10/12 20:52:54 INFO V2ScanRelationPushDown: Pushing operators to ods_pms25_t_psr_ds_p_transformer Pushed Filters: Post-Scan Filters: Output: psr_id#0, ast_id#1, name#2, run_dev_name#3, full_path_name#4, city#5, maint_org#6, maint_group#7, equipment_owner#8, pole#9, feeder#10, voltage_level#11, psr_state#12, start_time#13, stop_time#14, is_rural#15, importance#16, regionalism#17, supply_area#18, use_nature#19, dispatch_jurisdiction#20, dispatch_operation#21, dispatch_permission#22, dispatch_monitor#23, branch_feeder#24, line#25, ctime#26, switch_segment#27, pub_priv_flag#28, customer_id#29, installation_address#30, join_ec#31, last_update_time#32, reliable_segment#33, cons_no#34, cms_maint_org#35, ext_date_time#36, ext_ogg_seq#37, ext_flag#38, ext_src_system#39, ext_provincial_flag#40, ext_rowid#41, ext_reserve1#42, ext_reserve2#43, ext_reserve3#44, ext_valid_flag#45, cms_state#46, administ_regions#47L, urban_rural#48L, is_pmr_scdr_integration#49, feeder_segment#50, is_economic_operation#51, useful#52, installed_capacity#53, is_has_dg#54, is_coaltoelectricity#55, elechtg_method#56, is_standardized#57, is_chargepile_cons#58, chargepile_cons_type#59, maint_org_type#60, cust_importance_level#61, usagepoint_id#62 23/10/12 20:52:55 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'. 23/10/12 20:52:56 INFO CodeGenerator: Code generated in 182.385695 ms 23/10/12 20:52:56 INFO CodeGenerator: Code generated in 8.307713 ms 23/10/12 20:52:56 INFO SparkContext: Starting job: count at MaxComputeDemo.scala:38 23/10/12 20:52:56 INFO DAGScheduler: Registering RDD 3 (count at MaxComputeDemo.scala:38) as input to shuffle 0 23/10/12 20:52:56 INFO DAGScheduler: Got job 0 (count at MaxComputeDemo.scala:38) with 1 output partitions 23/10/12 20:52:56 INFO DAGScheduler: Final stage: ResultStage 1 (count at MaxComputeDemo.scala:38) 23/10/12 20:52:56 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 23/10/12 20:52:56 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0) 23/10/12 20:52:56 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at count at MaxComputeDemo.scala:38), which has no missing parents 23/10/12 20:52:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 17.6 KiB, free 891.0 MiB) 23/10/12 20:52:57 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 8.5 KiB, free 891.0 MiB) 23/10/12 20:52:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on O5Fguyingc02:49761 (size: 8.5 KiB, free: 891.0 MiB) 23/10/12 20:52:57 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1433 23/10/12 20:52:57 INFO DAGScheduler: Submitting 6 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at count at MaxComputeDemo.scala:38) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5)) 23/10/12 20:52:57 INFO TaskSchedulerImpl: Adding task set 0.0 with 6 tasks resource profile 0 23/10/12 20:52:57 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (O5Fguyingc02, executor driver, partition 0, PROCESS_LOCAL, 7608 bytes) taskResourceAssignments Map() 23/10/12 20:52:57 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (O5Fguyingc02, executor driver, partition 1, PROCESS_LOCAL, 7608 bytes) taskResourceAssignments Map() 23/10/12 20:52:57 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2) (O5Fguyingc02, executor driver, partition 2, PROCESS_LOCAL, 7608 bytes) taskResourceAssignments Map() 23/10/12 20:52:57 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3) (O5Fguyingc02, executor driver, partition 3, PROCESS_LOCAL, 7608 bytes) taskResourceAssignments Map() 23/10/12 20:52:57 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4) (O5Fguyingc02, executor driver, partition 4, PROCESS_LOCAL, 7608 bytes) taskResourceAssignments Map() 23/10/12 20:52:57 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5) (O5Fguyingc02, executor driver, partition 5, PROCESS_LOCAL, 7608 bytes) taskResourceAssignments Map() 23/10/12 20:52:57 INFO Executor: Running task 3.0 in stage 0.0 (TID 3) 23/10/12 20:52:57 INFO Executor: Running task 5.0 in stage 0.0 (TID 5) 23/10/12 20:52:57 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 23/10/12 20:52:57 INFO Executor: Running task 2.0 in stage 0.0 (TID 2) 23/10/12 20:52:57 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 23/10/12 20:52:57 INFO Executor: Running task 4.0 in stage 0.0 (TID 4) 23/10/12 20:53:07 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped 23/10/12 20:53:08 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 1846 bytes result sent to driver 23/10/12 20:53:08 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 1889 bytes result sent to driver 23/10/12 20:53:08 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 11494 ms on O5Fguyingc02 (executor driver) (1/6) 23/10/12 20:53:08 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 11495 ms on O5Fguyingc02 (executor driver) (2/6) 23/10/12 20:53:09 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1846 bytes result sent to driver 23/10/12 20:53:09 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 11956 ms on O5Fguyingc02 (executor driver) (3/6) 23/10/12 20:53:09 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1846 bytes result sent to driver 23/10/12 20:53:09 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 12232 ms on O5Fguyingc02 (executor driver) (4/6) 23/10/12 20:53:09 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 1846 bytes result sent to driver 23/10/12 20:53:09 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 12238 ms on O5Fguyingc02 (executor driver) (5/6) 23/10/12 20:53:09 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 1846 bytes result sent to driver 23/10/12 20:53:09 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 12560 ms on O5Fguyingc02 (executor driver) (6/6) 23/10/12 20:53:09 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 23/10/12 20:53:09 INFO DAGScheduler: ShuffleMapStage 0 (count at MaxComputeDemo.scala:38) finished in 13.304 s 23/10/12 20:53:09 INFO DAGScheduler: looking for newly runnable stages 23/10/12 20:53:09 INFO DAGScheduler: running: Set() 23/10/12 20:53:09 INFO DAGScheduler: waiting: Set(ResultStage 1) 23/10/12 20:53:09 INFO DAGScheduler: failed: Set() 23/10/12 20:53:09 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[6] at count at MaxComputeDemo.scala:38), which has no missing parents 23/10/12 20:53:09 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 10.1 KiB, free 891.0 MiB) 23/10/12 20:53:09 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.0 KiB, free 891.0 MiB) 23/10/12 20:53:09 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on O5Fguyingc02:49761 (size: 5.0 KiB, free: 891.0 MiB) 23/10/12 20:53:09 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1433 23/10/12 20:53:09 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[6] at count at MaxComputeDemo.scala:38) (first 15 tasks are for partitions Vector(0)) 23/10/12 20:53:09 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks resource profile 0 23/10/12 20:53:09 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 6) (O5Fguyingc02, executor driver, partition 0, NODE_LOCAL, 4453 bytes) taskResourceAssignments Map() 23/10/12 20:53:09 INFO Executor: Running task 0.0 in stage 1.0 (TID 6) 23/10/12 20:53:10 INFO ShuffleBlockFetcherIterator: Getting 6 (360.0 B) non-empty blocks including 6 (360.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) remote blocks 23/10/12 20:53:10 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms 23/10/12 20:53:10 INFO Executor: Finished task 0.0 in stage 1.0 (TID 6). 2450 bytes result sent to driver 23/10/12 20:53:10 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 6) in 84 ms on O5Fguyingc02 (executor driver) (1/1) 23/10/12 20:53:10 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 23/10/12 20:53:10 INFO DAGScheduler: ResultStage 1 (count at MaxComputeDemo.scala:38) finished in 0.091 s 23/10/12 20:53:10 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job 23/10/12 20:53:10 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished 23/10/12 20:53:10 INFO DAGScheduler: Job 0 finished: count at MaxComputeDemo.scala:38, took 13.466323 s maxcompute table row num is: 637738 23/10/12 20:53:10 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties 23/10/12 20:53:10 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 23/10/12 20:53:10 INFO MetricsSystemImpl: s3a-file-system metrics system started 23/10/12 20:53:11 INFO BlockManagerInfo: Removed broadcast_1_piece0 on O5Fguyingc02:49761 in memory (size: 5.0 KiB, free: 891.0 MiB) 23/10/12 20:53:12 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf 23/10/12 20:53:12 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file 23/10/12 20:53:12 WARN HoodieSparkSqlWriter$: hoodie table at s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer already exists. Deleting existing data & overwriting with new data. 23/10/12 20:53:13 INFO HoodieTableMetaClient: Initializing s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer as hoodie table s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:15 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:15 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:15 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:15 INFO HoodieTableMetaClient: Finished initializing Table of type COPY_ON_WRITE from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:16 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty splitSize, tableSizeInMB: 81, defaultParallelism: 6, parallelism: 6, splitSize: 14 23/10/12 20:53:16 INFO V2ScanRelationPushDown: Pushing operators to ods_pms25_t_psr_ds_p_transformer Pushed Filters: Post-Scan Filters: Output: psr_id#0, ast_id#1, name#2, run_dev_name#3, full_path_name#4, city#5, maint_org#6, maint_group#7, equipment_owner#8, pole#9, feeder#10, voltage_level#11, psr_state#12, start_time#13, stop_time#14, is_rural#15, importance#16, regionalism#17, supply_area#18, use_nature#19, dispatch_jurisdiction#20, dispatch_operation#21, dispatch_permission#22, dispatch_monitor#23, branch_feeder#24, line#25, ctime#26, switch_segment#27, pub_priv_flag#28, customer_id#29, installation_address#30, join_ec#31, last_update_time#32, reliable_segment#33, cons_no#34, cms_maint_org#35, ext_date_time#36, ext_ogg_seq#37, ext_flag#38, ext_src_system#39, ext_provincial_flag#40, ext_rowid#41, ext_reserve1#42, ext_reserve2#43, ext_reserve3#44, ext_valid_flag#45, cms_state#46, administ_regions#47L, urban_rural#48L, is_pmr_scdr_integration#49, feeder_segment#50, is_economic_operation#51, useful#52, installed_capacity#53, is_has_dg#54, is_coaltoelectricity#55, elechtg_method#56, is_standardized#57, is_chargepile_cons#58, chargepile_cons_type#59, maint_org_type#60, cust_importance_level#61, usagepoint_id#62 23/10/12 20:53:17 INFO CodeGenerator: Code generated in 51.351902 ms 23/10/12 20:53:19 INFO EmbeddedTimelineService: Starting Timeline service !! 23/10/12 20:53:19 INFO EmbeddedTimelineService: Overriding hostIp to (O5Fguyingc02) found in spark-conf. It was null 23/10/12 20:53:19 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY 23/10/12 20:53:19 INFO FileSystemViewManager: Creating in-memory based Table View 23/10/12 20:53:19 INFO log: Logging initialized @33942ms to org.apache.hudi.org.apache.jetty.util.log.Slf4jLog 23/10/12 20:53:19 INFO Javalin: __ __ _ __ __ / /____ _ _ __ ____ _ / /(_)____ / // / __ / // __ `/| | / // __ `// // // __ \ / // /_ / /_/ // /_/ / | |/ // /_/ // // // / / / /__ __/ \____/ \__,_/ |___/ \__,_//_//_//_/ /_/ /_/ https://javalin.io/documentation 23/10/12 20:53:19 INFO Javalin: Starting Javalin ... 23/10/12 20:53:19 INFO Javalin: You are running Javalin 4.6.7 (released October 24, 2022. Your Javalin version is 353 days old. Consider checking for a newer version.). 23/10/12 20:53:19 INFO Server: jetty-9.4.48.v20220622; built: 2022-06-21T20:42:25.880Z; git: 6b67c5719d1f4371b33655ff2d047d24e171e49a; jvm 1.8.0_201-b09 23/10/12 20:53:19 INFO Server: Started @34421ms 23/10/12 20:53:19 INFO Javalin: Listening on http://localhost:49803/ 23/10/12 20:53:19 INFO Javalin: Javalin started in 249ms \o/ 23/10/12 20:53:19 INFO TimelineService: Starting Timeline server on port :49803 23/10/12 20:53:19 INFO EmbeddedTimelineService: Started embedded timeline server at O5Fguyingc02:49803 23/10/12 20:53:19 INFO BaseHoodieClient: Timeline Server already running. Not restarting the service 23/10/12 20:53:19 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:19 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:19 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:19 INFO HoodieTableMetaClient: Loading Active commit timeline for s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:20 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty 23/10/12 20:53:20 INFO CleanerUtils: Cleaned failed attempts if any 23/10/12 20:53:20 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:20 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:20 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:20 INFO HoodieTableMetaClient: Loading Active commit timeline for s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:20 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty 23/10/12 20:53:20 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 23/10/12 20:53:20 INFO FileSystemViewManager: Creating remote first table view 23/10/12 20:53:20 INFO BaseHoodieWriteClient: Generate a new instant time: 20231012205312331 action: commit 23/10/12 20:53:20 INFO HoodieActiveTimeline: Creating a new instant [==>20231012205312331__commit__REQUESTED] 23/10/12 20:53:20 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:20 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:20 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:20 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 23/10/12 20:53:20 INFO FileSystemViewManager: Creating remote first table view 23/10/12 20:53:21 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20231012205312331__commit__REQUESTED]} 23/10/12 20:53:21 INFO AsyncCleanerService: The HoodieWriteClient is not configured to auto & async clean. Async clean service will not start. 23/10/12 20:53:21 INFO AsyncArchiveService: The HoodieWriteClient is not configured to auto & async archive. Async archive service will not start. 23/10/12 20:53:21 INFO CodeGenerator: Code generated in 37.105329 ms 23/10/12 20:53:21 INFO HoodieActiveTimeline: Checking for file exists ?s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/20231012205312331.commit.requested 23/10/12 20:53:21 INFO HoodieActiveTimeline: Create new file for toInstant ?s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/20231012205312331.inflight 23/10/12 20:53:21 INFO AppendDataExec: Start processing data source write support: org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite@28b16193. The input RDD has 6 partitions. 23/10/12 20:53:21 INFO SparkContext: Starting job: save at HoodieSparkSqlWriter.scala:823 23/10/12 20:53:21 INFO DAGScheduler: Got job 1 (save at HoodieSparkSqlWriter.scala:823) with 6 output partitions 23/10/12 20:53:21 INFO DAGScheduler: Final stage: ResultStage 2 (save at HoodieSparkSqlWriter.scala:823) 23/10/12 20:53:21 INFO DAGScheduler: Parents of final stage: List() 23/10/12 20:53:21 INFO DAGScheduler: Missing parents: List() 23/10/12 20:53:21 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[12] at save at HoodieSparkSqlWriter.scala:823), which has no missing parents 23/10/12 20:53:21 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 437.6 KiB, free 890.5 MiB) 23/10/12 20:53:21 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 144.6 KiB, free 890.4 MiB) 23/10/12 20:53:21 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on O5Fguyingc02:49761 (size: 144.6 KiB, free: 890.9 MiB) 23/10/12 20:53:21 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1433 23/10/12 20:53:21 INFO DAGScheduler: Submitting 6 missing tasks from ResultStage 2 (MapPartitionsRDD[12] at save at HoodieSparkSqlWriter.scala:823) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5)) 23/10/12 20:53:21 INFO TaskSchedulerImpl: Adding task set 2.0 with 6 tasks resource profile 0 23/10/12 20:53:21 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 7) (O5Fguyingc02, executor driver, partition 0, PROCESS_LOCAL, 7619 bytes) taskResourceAssignments Map() 23/10/12 20:53:21 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 8) (O5Fguyingc02, executor driver, partition 1, PROCESS_LOCAL, 7619 bytes) taskResourceAssignments Map() 23/10/12 20:53:21 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 9) (O5Fguyingc02, executor driver, partition 2, PROCESS_LOCAL, 7619 bytes) taskResourceAssignments Map() 23/10/12 20:53:21 INFO TaskSetManager: Starting task 3.0 in stage 2.0 (TID 10) (O5Fguyingc02, executor driver, partition 3, PROCESS_LOCAL, 7619 bytes) taskResourceAssignments Map() 23/10/12 20:53:21 INFO TaskSetManager: Starting task 4.0 in stage 2.0 (TID 11) (O5Fguyingc02, executor driver, partition 4, PROCESS_LOCAL, 7619 bytes) taskResourceAssignments Map() 23/10/12 20:53:21 INFO TaskSetManager: Starting task 5.0 in stage 2.0 (TID 12) (O5Fguyingc02, executor driver, partition 5, PROCESS_LOCAL, 7619 bytes) taskResourceAssignments Map() 23/10/12 20:53:21 INFO Executor: Running task 0.0 in stage 2.0 (TID 7) 23/10/12 20:53:21 INFO Executor: Running task 2.0 in stage 2.0 (TID 9) 23/10/12 20:53:21 INFO Executor: Running task 3.0 in stage 2.0 (TID 10) 23/10/12 20:53:21 INFO Executor: Running task 1.0 in stage 2.0 (TID 8) 23/10/12 20:53:21 INFO Executor: Running task 5.0 in stage 2.0 (TID 12) 23/10/12 20:53:21 INFO Executor: Running task 4.0 in stage 2.0 (TID 11) 23/10/12 20:53:22 INFO BulkInsertDataInternalWriterHelper: Creating new file for partition path 23/10/12 20:53:22 INFO BulkInsertDataInternalWriterHelper: Creating new file for partition path 23/10/12 20:53:22 INFO BulkInsertDataInternalWriterHelper: Creating new file for partition path 23/10/12 20:53:22 INFO BulkInsertDataInternalWriterHelper: Creating new file for partition path 23/10/12 20:53:22 INFO BulkInsertDataInternalWriterHelper: Creating new file for partition path 23/10/12 20:53:22 INFO BulkInsertDataInternalWriterHelper: Creating new file for partition path 23/10/12 20:53:24 INFO TimelineServerBasedWriteMarkers: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/marker/create?markername=063b5bf5-19e0-4cef-a32f-2072973ed863-0_4-11-0_20231012205312331.parquet.marker.CREATE&markerdirpath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer%2F.hoodie%2F.temp%2F20231012205312331) 23/10/12 20:53:24 INFO TimelineServerBasedWriteMarkers: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/marker/create?markername=5a636f9c-46ff-4fc7-9d9b-23d891f630e5-0_2-9-0_20231012205312331.parquet.marker.CREATE&markerdirpath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer%2F.hoodie%2F.temp%2F20231012205312331) 23/10/12 20:53:24 INFO TimelineServerBasedWriteMarkers: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/marker/create?markername=5374ebf7-4120-4efc-a2fb-2a6bf8809cf2-0_3-10-0_20231012205312331.parquet.marker.CREATE&markerdirpath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer%2F.hoodie%2F.temp%2F20231012205312331) 23/10/12 20:53:24 INFO TimelineServerBasedWriteMarkers: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/marker/create?markername=9b487a1a-e587-4b10-8aa4-5936782fd77d-0_5-12-0_20231012205312331.parquet.marker.CREATE&markerdirpath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer%2F.hoodie%2F.temp%2F20231012205312331) 23/10/12 20:53:24 INFO TimelineServerBasedWriteMarkers: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/marker/create?markername=99900379-9000-4b45-a887-47e6e3783f4c-0_1-8-0_20231012205312331.parquet.marker.CREATE&markerdirpath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer%2F.hoodie%2F.temp%2F20231012205312331) 23/10/12 20:53:24 INFO TimelineServerBasedWriteMarkers: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/marker/create?markername=5abcae8c-f16f-42af-bc28-7c3d66f03673-0_0-7-0_20231012205312331.parquet.marker.CREATE&markerdirpath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer%2F.hoodie%2F.temp%2F20231012205312331) 23/10/12 20:53:27 INFO MarkerHandler: Request: create marker s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/.temp/20231012205312331 5a636f9c-46ff-4fc7-9d9b-23d891f630e5-0_2-9-0_20231012205312331.parquet.marker.CREATE 23/10/12 20:53:27 INFO MarkerHandler: Request: create marker s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/.temp/20231012205312331 063b5bf5-19e0-4cef-a32f-2072973ed863-0_4-11-0_20231012205312331.parquet.marker.CREATE 23/10/12 20:53:27 INFO MarkerHandler: Request: create marker s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/.temp/20231012205312331 9b487a1a-e587-4b10-8aa4-5936782fd77d-0_5-12-0_20231012205312331.parquet.marker.CREATE 23/10/12 20:53:27 INFO MarkerHandler: Request: create marker s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/.temp/20231012205312331 5374ebf7-4120-4efc-a2fb-2a6bf8809cf2-0_3-10-0_20231012205312331.parquet.marker.CREATE 23/10/12 20:53:27 INFO MarkerHandler: Request: create marker s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/.temp/20231012205312331 5abcae8c-f16f-42af-bc28-7c3d66f03673-0_0-7-0_20231012205312331.parquet.marker.CREATE 23/10/12 20:53:27 INFO MarkerHandler: Request: create marker s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/.temp/20231012205312331 99900379-9000-4b45-a887-47e6e3783f4c-0_1-8-0_20231012205312331.parquet.marker.CREATE 23/10/12 20:53:28 INFO TimelineServerBasedWriteMarkers: [timeline-server-based] Created marker file /99900379-9000-4b45-a887-47e6e3783f4c-0_1-8-0_20231012205312331.parquet.marker.CREATE in 4837 ms 23/10/12 20:53:28 INFO TimelineServerBasedWriteMarkers: [timeline-server-based] Created marker file /5a636f9c-46ff-4fc7-9d9b-23d891f630e5-0_2-9-0_20231012205312331.parquet.marker.CREATE in 5030 ms 23/10/12 20:53:28 INFO TimelineServerBasedWriteMarkers: [timeline-server-based] Created marker file /063b5bf5-19e0-4cef-a32f-2072973ed863-0_4-11-0_20231012205312331.parquet.marker.CREATE in 5030 ms 23/10/12 20:53:28 INFO TimelineServerBasedWriteMarkers: [timeline-server-based] Created marker file /9b487a1a-e587-4b10-8aa4-5936782fd77d-0_5-12-0_20231012205312331.parquet.marker.CREATE in 4841 ms 23/10/12 20:53:28 INFO TimelineServerBasedWriteMarkers: [timeline-server-based] Created marker file /5abcae8c-f16f-42af-bc28-7c3d66f03673-0_0-7-0_20231012205312331.parquet.marker.CREATE in 4914 ms 23/10/12 20:53:28 INFO TimelineServerBasedWriteMarkers: [timeline-server-based] Created marker file /5374ebf7-4120-4efc-a2fb-2a6bf8809cf2-0_3-10-0_20231012205312331.parquet.marker.CREATE in 5030 ms 23/10/12 20:53:28 INFO HoodieRowParquetWriteSupport: Initialized Parquet WriteSupport with Catalyst schema: { "type" : "struct", "fields" : [ { "name" : "_hoodie_commit_time", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_commit_seqno", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_record_key", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_partition_path", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_file_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ast_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "run_dev_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "full_path_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "city", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_group", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "equipment_owner", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pole", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "voltage_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "start_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "stop_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "is_rural", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "importance", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "regionalism", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "supply_area", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "use_nature", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_jurisdiction", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_permission", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_monitor", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "branch_feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "line", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ctime", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "switch_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pub_priv_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "customer_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installation_address", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "join_ec", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "last_update_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "reliable_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cons_no", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_date_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "ext_ogg_seq", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_src_system", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_provincial_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_rowid", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve1", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve2", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve3", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_valid_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "administ_regions", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "urban_rural", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "is_pmr_scdr_integration", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_economic_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "useful", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installed_capacity", "type" : "decimal(38,18)", "nullable" : true, "metadata" : { } }, { "name" : "is_has_dg", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_coaltoelectricity", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "elechtg_method", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_standardized", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_chargepile_cons", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "chargepile_cons_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cust_importance_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "usagepoint_id", "type" : "string", "nullable" : true, "metadata" : { } } ] } and corresponding Parquet message type: message spark_schema { optional binary _hoodie_commit_time (UTF8); optional binary _hoodie_commit_seqno (UTF8); optional binary _hoodie_record_key (UTF8); optional binary _hoodie_partition_path (UTF8); optional binary _hoodie_file_name (UTF8); optional binary psr_id (UTF8); optional binary ast_id (UTF8); optional binary name (UTF8); optional binary run_dev_name (UTF8); optional binary full_path_name (UTF8); optional binary city (UTF8); optional binary maint_org (UTF8); optional binary maint_group (UTF8); optional binary equipment_owner (UTF8); optional binary pole (UTF8); optional binary feeder (UTF8); optional binary voltage_level (UTF8); optional binary psr_state (UTF8); optional int64 start_time (TIMESTAMP_MICROS); optional int64 stop_time (TIMESTAMP_MICROS); optional binary is_rural (UTF8); optional binary importance (UTF8); optional binary regionalism (UTF8); optional binary supply_area (UTF8); optional binary use_nature (UTF8); optional binary dispatch_jurisdiction (UTF8); optional binary dispatch_operation (UTF8); optional binary dispatch_permission (UTF8); optional binary dispatch_monitor (UTF8); optional binary branch_feeder (UTF8); optional binary line (UTF8); optional int64 ctime (TIMESTAMP_MICROS); optional binary switch_segment (UTF8); optional binary pub_priv_flag (UTF8); optional binary customer_id (UTF8); optional binary installation_address (UTF8); optional binary join_ec (UTF8); optional int64 last_update_time (TIMESTAMP_MICROS); optional binary reliable_segment (UTF8); optional binary cons_no (UTF8); optional binary cms_maint_org (UTF8); optional int64 ext_date_time (TIMESTAMP_MICROS); optional binary ext_ogg_seq (UTF8); optional binary ext_flag (UTF8); optional binary ext_src_system (UTF8); optional binary ext_provincial_flag (UTF8); optional binary ext_rowid (UTF8); optional binary ext_reserve1 (UTF8); optional binary ext_reserve2 (UTF8); optional binary ext_reserve3 (UTF8); optional binary ext_valid_flag (UTF8); optional binary cms_state (UTF8); optional int64 administ_regions; optional int64 urban_rural; optional binary is_pmr_scdr_integration (UTF8); optional binary feeder_segment (UTF8); optional binary is_economic_operation (UTF8); optional binary useful (UTF8); optional fixed_len_byte_array(16) installed_capacity (DECIMAL(38,18)); optional binary is_has_dg (UTF8); optional binary is_coaltoelectricity (UTF8); optional binary elechtg_method (UTF8); optional binary is_standardized (UTF8); optional binary is_chargepile_cons (UTF8); optional binary chargepile_cons_type (UTF8); optional binary maint_org_type (UTF8); optional binary cust_importance_level (UTF8); optional binary usagepoint_id (UTF8); } 23/10/12 20:53:28 INFO HoodieRowParquetWriteSupport: Initialized Parquet WriteSupport with Catalyst schema: { "type" : "struct", "fields" : [ { "name" : "_hoodie_commit_time", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_commit_seqno", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_record_key", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_partition_path", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_file_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ast_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "run_dev_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "full_path_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "city", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_group", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "equipment_owner", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pole", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "voltage_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "start_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "stop_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "is_rural", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "importance", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "regionalism", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "supply_area", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "use_nature", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_jurisdiction", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_permission", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_monitor", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "branch_feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "line", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ctime", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "switch_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pub_priv_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "customer_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installation_address", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "join_ec", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "last_update_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "reliable_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cons_no", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_date_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "ext_ogg_seq", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_src_system", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_provincial_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_rowid", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve1", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve2", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve3", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_valid_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "administ_regions", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "urban_rural", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "is_pmr_scdr_integration", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_economic_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "useful", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installed_capacity", "type" : "decimal(38,18)", "nullable" : true, "metadata" : { } }, { "name" : "is_has_dg", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_coaltoelectricity", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "elechtg_method", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_standardized", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_chargepile_cons", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "chargepile_cons_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cust_importance_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "usagepoint_id", "type" : "string", "nullable" : true, "metadata" : { } } ] } and corresponding Parquet message type: message spark_schema { optional binary _hoodie_commit_time (UTF8); optional binary _hoodie_commit_seqno (UTF8); optional binary _hoodie_record_key (UTF8); optional binary _hoodie_partition_path (UTF8); optional binary _hoodie_file_name (UTF8); optional binary psr_id (UTF8); optional binary ast_id (UTF8); optional binary name (UTF8); optional binary run_dev_name (UTF8); optional binary full_path_name (UTF8); optional binary city (UTF8); optional binary maint_org (UTF8); optional binary maint_group (UTF8); optional binary equipment_owner (UTF8); optional binary pole (UTF8); optional binary feeder (UTF8); optional binary voltage_level (UTF8); optional binary psr_state (UTF8); optional int64 start_time (TIMESTAMP_MICROS); optional int64 stop_time (TIMESTAMP_MICROS); optional binary is_rural (UTF8); optional binary importance (UTF8); optional binary regionalism (UTF8); optional binary supply_area (UTF8); optional binary use_nature (UTF8); optional binary dispatch_jurisdiction (UTF8); optional binary dispatch_operation (UTF8); optional binary dispatch_permission (UTF8); optional binary dispatch_monitor (UTF8); optional binary branch_feeder (UTF8); optional binary line (UTF8); optional int64 ctime (TIMESTAMP_MICROS); optional binary switch_segment (UTF8); optional binary pub_priv_flag (UTF8); optional binary customer_id (UTF8); optional binary installation_address (UTF8); optional binary join_ec (UTF8); optional int64 last_update_time (TIMESTAMP_MICROS); optional binary reliable_segment (UTF8); optional binary cons_no (UTF8); optional binary cms_maint_org (UTF8); optional int64 ext_date_time (TIMESTAMP_MICROS); optional binary ext_ogg_seq (UTF8); optional binary ext_flag (UTF8); optional binary ext_src_system (UTF8); optional binary ext_provincial_flag (UTF8); optional binary ext_rowid (UTF8); optional binary ext_reserve1 (UTF8); optional binary ext_reserve2 (UTF8); optional binary ext_reserve3 (UTF8); optional binary ext_valid_flag (UTF8); optional binary cms_state (UTF8); optional int64 administ_regions; optional int64 urban_rural; optional binary is_pmr_scdr_integration (UTF8); optional binary feeder_segment (UTF8); optional binary is_economic_operation (UTF8); optional binary useful (UTF8); optional fixed_len_byte_array(16) installed_capacity (DECIMAL(38,18)); optional binary is_has_dg (UTF8); optional binary is_coaltoelectricity (UTF8); optional binary elechtg_method (UTF8); optional binary is_standardized (UTF8); optional binary is_chargepile_cons (UTF8); optional binary chargepile_cons_type (UTF8); optional binary maint_org_type (UTF8); optional binary cust_importance_level (UTF8); optional binary usagepoint_id (UTF8); } 23/10/12 20:53:28 INFO HoodieRowParquetWriteSupport: Initialized Parquet WriteSupport with Catalyst schema: { "type" : "struct", "fields" : [ { "name" : "_hoodie_commit_time", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_commit_seqno", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_record_key", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_partition_path", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_file_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ast_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "run_dev_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "full_path_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "city", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_group", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "equipment_owner", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pole", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "voltage_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "start_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "stop_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "is_rural", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "importance", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "regionalism", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "supply_area", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "use_nature", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_jurisdiction", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_permission", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_monitor", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "branch_feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "line", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ctime", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "switch_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pub_priv_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "customer_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installation_address", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "join_ec", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "last_update_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "reliable_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cons_no", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_date_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "ext_ogg_seq", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_src_system", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_provincial_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_rowid", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve1", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve2", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve3", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_valid_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "administ_regions", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "urban_rural", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "is_pmr_scdr_integration", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_economic_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "useful", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installed_capacity", "type" : "decimal(38,18)", "nullable" : true, "metadata" : { } }, { "name" : "is_has_dg", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_coaltoelectricity", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "elechtg_method", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_standardized", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_chargepile_cons", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "chargepile_cons_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cust_importance_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "usagepoint_id", "type" : "string", "nullable" : true, "metadata" : { } } ] } and corresponding Parquet message type: message spark_schema { optional binary _hoodie_commit_time (UTF8); optional binary _hoodie_commit_seqno (UTF8); optional binary _hoodie_record_key (UTF8); optional binary _hoodie_partition_path (UTF8); optional binary _hoodie_file_name (UTF8); optional binary psr_id (UTF8); optional binary ast_id (UTF8); optional binary name (UTF8); optional binary run_dev_name (UTF8); optional binary full_path_name (UTF8); optional binary city (UTF8); optional binary maint_org (UTF8); optional binary maint_group (UTF8); optional binary equipment_owner (UTF8); optional binary pole (UTF8); optional binary feeder (UTF8); optional binary voltage_level (UTF8); optional binary psr_state (UTF8); optional int64 start_time (TIMESTAMP_MICROS); optional int64 stop_time (TIMESTAMP_MICROS); optional binary is_rural (UTF8); optional binary importance (UTF8); optional binary regionalism (UTF8); optional binary supply_area (UTF8); optional binary use_nature (UTF8); optional binary dispatch_jurisdiction (UTF8); optional binary dispatch_operation (UTF8); optional binary dispatch_permission (UTF8); optional binary dispatch_monitor (UTF8); optional binary branch_feeder (UTF8); optional binary line (UTF8); optional int64 ctime (TIMESTAMP_MICROS); optional binary switch_segment (UTF8); optional binary pub_priv_flag (UTF8); optional binary customer_id (UTF8); optional binary installation_address (UTF8); optional binary join_ec (UTF8); optional int64 last_update_time (TIMESTAMP_MICROS); optional binary reliable_segment (UTF8); optional binary cons_no (UTF8); optional binary cms_maint_org (UTF8); optional int64 ext_date_time (TIMESTAMP_MICROS); optional binary ext_ogg_seq (UTF8); optional binary ext_flag (UTF8); optional binary ext_src_system (UTF8); optional binary ext_provincial_flag (UTF8); optional binary ext_rowid (UTF8); optional binary ext_reserve1 (UTF8); optional binary ext_reserve2 (UTF8); optional binary ext_reserve3 (UTF8); optional binary ext_valid_flag (UTF8); optional binary cms_state (UTF8); optional int64 administ_regions; optional int64 urban_rural; optional binary is_pmr_scdr_integration (UTF8); optional binary feeder_segment (UTF8); optional binary is_economic_operation (UTF8); optional binary useful (UTF8); optional fixed_len_byte_array(16) installed_capacity (DECIMAL(38,18)); optional binary is_has_dg (UTF8); optional binary is_coaltoelectricity (UTF8); optional binary elechtg_method (UTF8); optional binary is_standardized (UTF8); optional binary is_chargepile_cons (UTF8); optional binary chargepile_cons_type (UTF8); optional binary maint_org_type (UTF8); optional binary cust_importance_level (UTF8); optional binary usagepoint_id (UTF8); } 23/10/12 20:53:28 INFO HoodieRowParquetWriteSupport: Initialized Parquet WriteSupport with Catalyst schema: { "type" : "struct", "fields" : [ { "name" : "_hoodie_commit_time", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_commit_seqno", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_record_key", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_partition_path", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_file_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ast_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "run_dev_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "full_path_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "city", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_group", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "equipment_owner", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pole", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "voltage_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "start_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "stop_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "is_rural", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "importance", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "regionalism", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "supply_area", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "use_nature", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_jurisdiction", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_permission", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_monitor", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "branch_feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "line", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ctime", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "switch_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pub_priv_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "customer_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installation_address", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "join_ec", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "last_update_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "reliable_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cons_no", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_date_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "ext_ogg_seq", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_src_system", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_provincial_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_rowid", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve1", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve2", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve3", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_valid_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "administ_regions", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "urban_rural", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "is_pmr_scdr_integration", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_economic_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "useful", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installed_capacity", "type" : "decimal(38,18)", "nullable" : true, "metadata" : { } }, { "name" : "is_has_dg", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_coaltoelectricity", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "elechtg_method", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_standardized", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_chargepile_cons", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "chargepile_cons_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cust_importance_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "usagepoint_id", "type" : "string", "nullable" : true, "metadata" : { } } ] } and corresponding Parquet message type: message spark_schema { optional binary _hoodie_commit_time (UTF8); optional binary _hoodie_commit_seqno (UTF8); optional binary _hoodie_record_key (UTF8); optional binary _hoodie_partition_path (UTF8); optional binary _hoodie_file_name (UTF8); optional binary psr_id (UTF8); optional binary ast_id (UTF8); optional binary name (UTF8); optional binary run_dev_name (UTF8); optional binary full_path_name (UTF8); optional binary city (UTF8); optional binary maint_org (UTF8); optional binary maint_group (UTF8); optional binary equipment_owner (UTF8); optional binary pole (UTF8); optional binary feeder (UTF8); optional binary voltage_level (UTF8); optional binary psr_state (UTF8); optional int64 start_time (TIMESTAMP_MICROS); optional int64 stop_time (TIMESTAMP_MICROS); optional binary is_rural (UTF8); optional binary importance (UTF8); optional binary regionalism (UTF8); optional binary supply_area (UTF8); optional binary use_nature (UTF8); optional binary dispatch_jurisdiction (UTF8); optional binary dispatch_operation (UTF8); optional binary dispatch_permission (UTF8); optional binary dispatch_monitor (UTF8); optional binary branch_feeder (UTF8); optional binary line (UTF8); optional int64 ctime (TIMESTAMP_MICROS); optional binary switch_segment (UTF8); optional binary pub_priv_flag (UTF8); optional binary customer_id (UTF8); optional binary installation_address (UTF8); optional binary join_ec (UTF8); optional int64 last_update_time (TIMESTAMP_MICROS); optional binary reliable_segment (UTF8); optional binary cons_no (UTF8); optional binary cms_maint_org (UTF8); optional int64 ext_date_time (TIMESTAMP_MICROS); optional binary ext_ogg_seq (UTF8); optional binary ext_flag (UTF8); optional binary ext_src_system (UTF8); optional binary ext_provincial_flag (UTF8); optional binary ext_rowid (UTF8); optional binary ext_reserve1 (UTF8); optional binary ext_reserve2 (UTF8); optional binary ext_reserve3 (UTF8); optional binary ext_valid_flag (UTF8); optional binary cms_state (UTF8); optional int64 administ_regions; optional int64 urban_rural; optional binary is_pmr_scdr_integration (UTF8); optional binary feeder_segment (UTF8); optional binary is_economic_operation (UTF8); optional binary useful (UTF8); optional fixed_len_byte_array(16) installed_capacity (DECIMAL(38,18)); optional binary is_has_dg (UTF8); optional binary is_coaltoelectricity (UTF8); optional binary elechtg_method (UTF8); optional binary is_standardized (UTF8); optional binary is_chargepile_cons (UTF8); optional binary chargepile_cons_type (UTF8); optional binary maint_org_type (UTF8); optional binary cust_importance_level (UTF8); optional binary usagepoint_id (UTF8); } 23/10/12 20:53:28 INFO HoodieRowParquetWriteSupport: Initialized Parquet WriteSupport with Catalyst schema: { "type" : "struct", "fields" : [ { "name" : "_hoodie_commit_time", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_commit_seqno", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_record_key", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_partition_path", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_file_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ast_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "run_dev_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "full_path_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "city", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_group", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "equipment_owner", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pole", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "voltage_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "start_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "stop_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "is_rural", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "importance", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "regionalism", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "supply_area", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "use_nature", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_jurisdiction", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_permission", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_monitor", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "branch_feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "line", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ctime", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "switch_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pub_priv_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "customer_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installation_address", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "join_ec", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "last_update_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "reliable_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cons_no", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_date_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "ext_ogg_seq", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_src_system", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_provincial_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_rowid", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve1", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve2", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve3", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_valid_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "administ_regions", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "urban_rural", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "is_pmr_scdr_integration", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_economic_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "useful", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installed_capacity", "type" : "decimal(38,18)", "nullable" : true, "metadata" : { } }, { "name" : "is_has_dg", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_coaltoelectricity", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "elechtg_method", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_standardized", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_chargepile_cons", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "chargepile_cons_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cust_importance_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "usagepoint_id", "type" : "string", "nullable" : true, "metadata" : { } } ] } and corresponding Parquet message type: message spark_schema { optional binary _hoodie_commit_time (UTF8); optional binary _hoodie_commit_seqno (UTF8); optional binary _hoodie_record_key (UTF8); optional binary _hoodie_partition_path (UTF8); optional binary _hoodie_file_name (UTF8); optional binary psr_id (UTF8); optional binary ast_id (UTF8); optional binary name (UTF8); optional binary run_dev_name (UTF8); optional binary full_path_name (UTF8); optional binary city (UTF8); optional binary maint_org (UTF8); optional binary maint_group (UTF8); optional binary equipment_owner (UTF8); optional binary pole (UTF8); optional binary feeder (UTF8); optional binary voltage_level (UTF8); optional binary psr_state (UTF8); optional int64 start_time (TIMESTAMP_MICROS); optional int64 stop_time (TIMESTAMP_MICROS); optional binary is_rural (UTF8); optional binary importance (UTF8); optional binary regionalism (UTF8); optional binary supply_area (UTF8); optional binary use_nature (UTF8); optional binary dispatch_jurisdiction (UTF8); optional binary dispatch_operation (UTF8); optional binary dispatch_permission (UTF8); optional binary dispatch_monitor (UTF8); optional binary branch_feeder (UTF8); optional binary line (UTF8); optional int64 ctime (TIMESTAMP_MICROS); optional binary switch_segment (UTF8); optional binary pub_priv_flag (UTF8); optional binary customer_id (UTF8); optional binary installation_address (UTF8); optional binary join_ec (UTF8); optional int64 last_update_time (TIMESTAMP_MICROS); optional binary reliable_segment (UTF8); optional binary cons_no (UTF8); optional binary cms_maint_org (UTF8); optional int64 ext_date_time (TIMESTAMP_MICROS); optional binary ext_ogg_seq (UTF8); optional binary ext_flag (UTF8); optional binary ext_src_system (UTF8); optional binary ext_provincial_flag (UTF8); optional binary ext_rowid (UTF8); optional binary ext_reserve1 (UTF8); optional binary ext_reserve2 (UTF8); optional binary ext_reserve3 (UTF8); optional binary ext_valid_flag (UTF8); optional binary cms_state (UTF8); optional int64 administ_regions; optional int64 urban_rural; optional binary is_pmr_scdr_integration (UTF8); optional binary feeder_segment (UTF8); optional binary is_economic_operation (UTF8); optional binary useful (UTF8); optional fixed_len_byte_array(16) installed_capacity (DECIMAL(38,18)); optional binary is_has_dg (UTF8); optional binary is_coaltoelectricity (UTF8); optional binary elechtg_method (UTF8); optional binary is_standardized (UTF8); optional binary is_chargepile_cons (UTF8); optional binary chargepile_cons_type (UTF8); optional binary maint_org_type (UTF8); optional binary cust_importance_level (UTF8); optional binary usagepoint_id (UTF8); } 23/10/12 20:53:28 INFO HoodieRowParquetWriteSupport: Initialized Parquet WriteSupport with Catalyst schema: { "type" : "struct", "fields" : [ { "name" : "_hoodie_commit_time", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_commit_seqno", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_record_key", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_partition_path", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "_hoodie_file_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ast_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "run_dev_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "full_path_name", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "city", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_group", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "equipment_owner", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pole", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "voltage_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "psr_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "start_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "stop_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "is_rural", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "importance", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "regionalism", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "supply_area", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "use_nature", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_jurisdiction", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_permission", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "dispatch_monitor", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "branch_feeder", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "line", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ctime", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "switch_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "pub_priv_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "customer_id", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installation_address", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "join_ec", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "last_update_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "reliable_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cons_no", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_maint_org", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_date_time", "type" : "timestamp", "nullable" : true, "metadata" : { } }, { "name" : "ext_ogg_seq", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_src_system", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_provincial_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_rowid", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve1", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve2", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_reserve3", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "ext_valid_flag", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cms_state", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "administ_regions", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "urban_rural", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "is_pmr_scdr_integration", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "feeder_segment", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_economic_operation", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "useful", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "installed_capacity", "type" : "decimal(38,18)", "nullable" : true, "metadata" : { } }, { "name" : "is_has_dg", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_coaltoelectricity", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "elechtg_method", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_standardized", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "is_chargepile_cons", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "chargepile_cons_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "maint_org_type", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "cust_importance_level", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "usagepoint_id", "type" : "string", "nullable" : true, "metadata" : { } } ] } and corresponding Parquet message type: message spark_schema { optional binary _hoodie_commit_time (UTF8); optional binary _hoodie_commit_seqno (UTF8); optional binary _hoodie_record_key (UTF8); optional binary _hoodie_partition_path (UTF8); optional binary _hoodie_file_name (UTF8); optional binary psr_id (UTF8); optional binary ast_id (UTF8); optional binary name (UTF8); optional binary run_dev_name (UTF8); optional binary full_path_name (UTF8); optional binary city (UTF8); optional binary maint_org (UTF8); optional binary maint_group (UTF8); optional binary equipment_owner (UTF8); optional binary pole (UTF8); optional binary feeder (UTF8); optional binary voltage_level (UTF8); optional binary psr_state (UTF8); optional int64 start_time (TIMESTAMP_MICROS); optional int64 stop_time (TIMESTAMP_MICROS); optional binary is_rural (UTF8); optional binary importance (UTF8); optional binary regionalism (UTF8); optional binary supply_area (UTF8); optional binary use_nature (UTF8); optional binary dispatch_jurisdiction (UTF8); optional binary dispatch_operation (UTF8); optional binary dispatch_permission (UTF8); optional binary dispatch_monitor (UTF8); optional binary branch_feeder (UTF8); optional binary line (UTF8); optional int64 ctime (TIMESTAMP_MICROS); optional binary switch_segment (UTF8); optional binary pub_priv_flag (UTF8); optional binary customer_id (UTF8); optional binary installation_address (UTF8); optional binary join_ec (UTF8); optional int64 last_update_time (TIMESTAMP_MICROS); optional binary reliable_segment (UTF8); optional binary cons_no (UTF8); optional binary cms_maint_org (UTF8); optional int64 ext_date_time (TIMESTAMP_MICROS); optional binary ext_ogg_seq (UTF8); optional binary ext_flag (UTF8); optional binary ext_src_system (UTF8); optional binary ext_provincial_flag (UTF8); optional binary ext_rowid (UTF8); optional binary ext_reserve1 (UTF8); optional binary ext_reserve2 (UTF8); optional binary ext_reserve3 (UTF8); optional binary ext_valid_flag (UTF8); optional binary cms_state (UTF8); optional int64 administ_regions; optional int64 urban_rural; optional binary is_pmr_scdr_integration (UTF8); optional binary feeder_segment (UTF8); optional binary is_economic_operation (UTF8); optional binary useful (UTF8); optional fixed_len_byte_array(16) installed_capacity (DECIMAL(38,18)); optional binary is_has_dg (UTF8); optional binary is_coaltoelectricity (UTF8); optional binary elechtg_method (UTF8); optional binary is_standardized (UTF8); optional binary is_chargepile_cons (UTF8); optional binary chargepile_cons_type (UTF8); optional binary maint_org_type (UTF8); optional binary cust_importance_level (UTF8); optional binary usagepoint_id (UTF8); } 23/10/12 20:53:28 WARN ZlibFactory: Failed to load/initialize native-zlib library 23/10/12 20:53:28 INFO CodecPool: Got brand-new compressor [.gz] 23/10/12 20:53:28 INFO CodecPool: Got brand-new compressor [.gz] 23/10/12 20:53:28 INFO CodecPool: Got brand-new compressor [.gz] 23/10/12 20:53:28 INFO CodecPool: Got brand-new compressor [.gz] 23/10/12 20:53:28 INFO CodecPool: Got brand-new compressor [.gz] 23/10/12 20:53:28 INFO CodecPool: Got brand-new compressor [.gz] 23/10/12 20:53:29 INFO HoodieRowCreateHandle: New handle created for partition: with fileId 99900379-9000-4b45-a887-47e6e3783f4c-0 23/10/12 20:53:29 INFO HoodieRowCreateHandle: New handle created for partition: with fileId 5a636f9c-46ff-4fc7-9d9b-23d891f630e5-0 23/10/12 20:53:29 INFO HoodieRowCreateHandle: New handle created for partition: with fileId 5374ebf7-4120-4efc-a2fb-2a6bf8809cf2-0 23/10/12 20:53:29 INFO HoodieRowCreateHandle: New handle created for partition: with fileId 5abcae8c-f16f-42af-bc28-7c3d66f03673-0 23/10/12 20:53:29 INFO HoodieRowCreateHandle: New handle created for partition: with fileId 9b487a1a-e587-4b10-8aa4-5936782fd77d-0 23/10/12 20:53:29 INFO HoodieRowCreateHandle: New handle created for partition: with fileId 063b5bf5-19e0-4cef-a32f-2072973ed863-0 23/10/12 20:53:32 INFO BlockManagerInfo: Removed broadcast_0_piece0 on O5Fguyingc02:49761 in memory (size: 8.5 KiB, free: 890.9 MiB) 23/10/12 20:53:40 INFO DataWritingSparkTask: Commit authorized for partition 5 (task 12, attempt 0, stage 2.0) 23/10/12 20:53:40 INFO InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 24190738 23/10/12 20:53:41 INFO DataWritingSparkTask: Commit authorized for partition 2 (task 9, attempt 0, stage 2.0) 23/10/12 20:53:41 INFO InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 27767065 23/10/12 20:53:42 INFO DataWritingSparkTask: Commit authorized for partition 4 (task 11, attempt 0, stage 2.0) 23/10/12 20:53:42 INFO InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 29091438 23/10/12 20:53:43 INFO DataWritingSparkTask: Commit authorized for partition 3 (task 10, attempt 0, stage 2.0) 23/10/12 20:53:43 INFO InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 30353996 23/10/12 20:53:43 INFO DataWritingSparkTask: Commit authorized for partition 0 (task 7, attempt 0, stage 2.0) 23/10/12 20:53:43 INFO InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 27307746 23/10/12 20:53:43 INFO DataWritingSparkTask: Commit authorized for partition 1 (task 8, attempt 0, stage 2.0) 23/10/12 20:53:43 INFO InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 32766167 23/10/12 20:53:44 INFO DataWritingSparkTask: Committed partition 5 (task 12, attempt 0, stage 2.0) 23/10/12 20:53:44 INFO Executor: Finished task 5.0 in stage 2.0 (TID 12). 1885 bytes result sent to driver 23/10/12 20:53:46 INFO TaskSetManager: Finished task 5.0 in stage 2.0 (TID 12) in 24542 ms on O5Fguyingc02 (executor driver) (1/6) 23/10/12 20:53:46 INFO DataSourceInternalWriterHelper: Received commit of a data writer = HoodieWriterCommitMessage{writeStatuses=[PartitionPath , FileID 9b487a1a-e587-4b10-8aa4-5936782fd77d-0, Success records 85858, errored Rows 0, global error false]} 23/10/12 20:53:49 INFO DataWritingSparkTask: Committed partition 2 (task 9, attempt 0, stage 2.0) 23/10/12 20:53:49 INFO Executor: Finished task 2.0 in stage 2.0 (TID 9). 6153 bytes result sent to driver 23/10/12 20:53:49 INFO TaskSetManager: Finished task 2.0 in stage 2.0 (TID 9) in 27649 ms on O5Fguyingc02 (executor driver) (2/6) 23/10/12 20:53:49 INFO DataSourceInternalWriterHelper: Received commit of a data writer = HoodieWriterCommitMessage{writeStatuses=[PartitionPath , FileID 5a636f9c-46ff-4fc7-9d9b-23d891f630e5-0, Success records 110376, errored Rows 0, global error false]} 23/10/12 20:53:49 INFO DataWritingSparkTask: Committed partition 1 (task 8, attempt 0, stage 2.0) 23/10/12 20:53:49 INFO Executor: Finished task 1.0 in stage 2.0 (TID 8). 1841 bytes result sent to driver 23/10/12 20:53:49 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 8) in 28113 ms on O5Fguyingc02 (executor driver) (3/6) 23/10/12 20:53:49 INFO DataSourceInternalWriterHelper: Received commit of a data writer = HoodieWriterCommitMessage{writeStatuses=[PartitionPath , FileID 99900379-9000-4b45-a887-47e6e3783f4c-0, Success records 110376, errored Rows 0, global error false]} 23/10/12 20:53:50 INFO DataWritingSparkTask: Committed partition 4 (task 11, attempt 0, stage 2.0) 23/10/12 20:53:50 INFO Executor: Finished task 4.0 in stage 2.0 (TID 11). 7842 bytes result sent to driver 23/10/12 20:53:50 INFO TaskSetManager: Finished task 4.0 in stage 2.0 (TID 11) in 28934 ms on O5Fguyingc02 (executor driver) (4/6) 23/10/12 20:53:50 INFO DataSourceInternalWriterHelper: Received commit of a data writer = HoodieWriterCommitMessage{writeStatuses=[PartitionPath , FileID 063b5bf5-19e0-4cef-a32f-2072973ed863-0, Success records 110376, errored Rows 0, global error false]} 23/10/12 20:53:50 INFO DataWritingSparkTask: Committed partition 0 (task 7, attempt 0, stage 2.0) 23/10/12 20:53:50 INFO Executor: Finished task 0.0 in stage 2.0 (TID 7). 5307 bytes result sent to driver 23/10/12 20:53:50 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 7) in 28955 ms on O5Fguyingc02 (executor driver) (5/6) 23/10/12 20:53:50 INFO DataSourceInternalWriterHelper: Received commit of a data writer = HoodieWriterCommitMessage{writeStatuses=[PartitionPath , FileID 5abcae8c-f16f-42af-bc28-7c3d66f03673-0, Success records 110376, errored Rows 0, global error false]} 23/10/12 20:53:51 INFO DataWritingSparkTask: Committed partition 3 (task 10, attempt 0, stage 2.0) 23/10/12 20:53:51 INFO Executor: Finished task 3.0 in stage 2.0 (TID 10). 1842 bytes result sent to driver 23/10/12 20:53:51 INFO TaskSetManager: Finished task 3.0 in stage 2.0 (TID 10) in 29568 ms on O5Fguyingc02 (executor driver) (6/6) 23/10/12 20:53:51 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 23/10/12 20:53:51 INFO DAGScheduler: ResultStage 2 (save at HoodieSparkSqlWriter.scala:823) finished in 29.603 s 23/10/12 20:53:51 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job 23/10/12 20:53:51 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished 23/10/12 20:53:51 INFO DataSourceInternalWriterHelper: Received commit of a data writer = HoodieWriterCommitMessage{writeStatuses=[PartitionPath , FileID 5374ebf7-4120-4efc-a2fb-2a6bf8809cf2-0, Success records 110376, errored Rows 0, global error false]} 23/10/12 20:53:51 INFO DAGScheduler: Job 1 finished: save at HoodieSparkSqlWriter.scala:823, took 29.608389 s 23/10/12 20:53:51 INFO AppendDataExec: Data source write support org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite@28b16193 is committing. 23/10/12 20:53:51 INFO BaseHoodieWriteClient: Committing 20231012205312331 action commit 23/10/12 20:53:51 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:51 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:51 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:51 INFO HoodieTableMetaClient: Loading Active commit timeline for s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:51 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20231012205312331__commit__INFLIGHT]} 23/10/12 20:53:51 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 23/10/12 20:53:51 INFO FileSystemViewManager: Creating remote first table view 23/10/12 20:53:51 INFO CommitUtils: Creating metadata for BULK_INSERT numWriteStats:6 numReplaceFileIds:0 23/10/12 20:53:51 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:52 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:52 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:52 INFO HoodieTableMetaClient: Loading Active commit timeline for s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:52 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20231012205312331__commit__INFLIGHT]} 23/10/12 20:53:52 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 23/10/12 20:53:52 INFO FileSystemViewManager: Creating remote first table view 23/10/12 20:53:52 INFO BaseHoodieWriteClient: Committing 20231012205312331 action commit 23/10/12 20:53:52 INFO TimelineServerBasedWriteMarkers: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/marker/dir/exists?markerdirpath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer%2F.hoodie%2F.temp%2F20231012205312331) 23/10/12 20:53:52 INFO TimelineServerBasedWriteMarkers: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/marker/create-and-merge?markerdirpath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer%2F.hoodie%2F.temp%2F20231012205312331) 23/10/12 20:53:52 INFO HoodieActiveTimeline: Marking instant complete [==>20231012205312331__commit__INFLIGHT] 23/10/12 20:53:52 INFO HoodieActiveTimeline: Checking for file exists ?s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/20231012205312331.inflight 23/10/12 20:53:52 INFO HoodieActiveTimeline: Create new file for toInstant ?s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/20231012205312331.commit 23/10/12 20:53:52 INFO HoodieActiveTimeline: Completed [==>20231012205312331__commit__INFLIGHT] 23/10/12 20:53:52 INFO TimelineServerBasedWriteMarkers: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/marker/dir/delete?markerdirpath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer%2F.hoodie%2F.temp%2F20231012205312331) 23/10/12 20:53:53 INFO SparkContext: Starting job: collectAsMap at HoodieSparkEngineContext.java:151 23/10/12 20:53:53 INFO DAGScheduler: Got job 2 (collectAsMap at HoodieSparkEngineContext.java:151) with 2 output partitions 23/10/12 20:53:53 INFO DAGScheduler: Final stage: ResultStage 3 (collectAsMap at HoodieSparkEngineContext.java:151) 23/10/12 20:53:53 INFO DAGScheduler: Parents of final stage: List() 23/10/12 20:53:53 INFO DAGScheduler: Missing parents: List() 23/10/12 20:53:53 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[14] at mapToPair at HoodieSparkEngineContext.java:148), which has no missing parents 23/10/12 20:53:53 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 100.0 KiB, free 890.3 MiB) 23/10/12 20:53:53 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 36.1 KiB, free 890.3 MiB) 23/10/12 20:53:53 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on O5Fguyingc02:49761 (size: 36.1 KiB, free: 890.8 MiB) 23/10/12 20:53:53 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1433 23/10/12 20:53:53 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 3 (MapPartitionsRDD[14] at mapToPair at HoodieSparkEngineContext.java:148) (first 15 tasks are for partitions Vector(0, 1)) 23/10/12 20:53:53 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks resource profile 0 23/10/12 20:53:53 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 13) (O5Fguyingc02, executor driver, partition 0, PROCESS_LOCAL, 4435 bytes) taskResourceAssignments Map() 23/10/12 20:53:53 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 14) (O5Fguyingc02, executor driver, partition 1, PROCESS_LOCAL, 4431 bytes) taskResourceAssignments Map() 23/10/12 20:53:53 INFO Executor: Running task 0.0 in stage 3.0 (TID 13) 23/10/12 20:53:53 INFO Executor: Running task 1.0 in stage 3.0 (TID 14) 23/10/12 20:53:53 INFO Executor: Finished task 0.0 in stage 3.0 (TID 13). 863 bytes result sent to driver 23/10/12 20:53:53 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 13) in 185 ms on O5Fguyingc02 (executor driver) (1/2) 23/10/12 20:53:53 INFO Executor: Finished task 1.0 in stage 3.0 (TID 14). 902 bytes result sent to driver 23/10/12 20:53:53 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 14) in 247 ms on O5Fguyingc02 (executor driver) (2/2) 23/10/12 20:53:53 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 23/10/12 20:53:53 INFO DAGScheduler: ResultStage 3 (collectAsMap at HoodieSparkEngineContext.java:151) finished in 0.260 s 23/10/12 20:53:53 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job 23/10/12 20:53:53 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished 23/10/12 20:53:53 INFO DAGScheduler: Job 2 finished: collectAsMap at HoodieSparkEngineContext.java:151, took 0.263366 s 23/10/12 20:53:53 INFO FSUtils: Removed directory at s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/.temp/20231012205312331 23/10/12 20:53:53 INFO BaseHoodieWriteClient: Committed 20231012205312331 23/10/12 20:53:53 INFO BaseHoodieWriteClient: Start to clean synchronously. 23/10/12 20:53:53 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:53 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:53 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:53 INFO HoodieTableMetaClient: Loading Active commit timeline for s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:54 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20231012205312331__commit__COMPLETED]} 23/10/12 20:53:54 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 23/10/12 20:53:54 INFO FileSystemViewManager: Creating remote first table view 23/10/12 20:53:54 INFO BaseHoodieWriteClient: Cleaner started 23/10/12 20:53:54 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:54 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:54 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:54 INFO HoodieTableMetaClient: Loading Active commit timeline for s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:54 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20231012205312331__commit__COMPLETED]} 23/10/12 20:53:54 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 23/10/12 20:53:54 INFO FileSystemViewManager: Creating remote first table view 23/10/12 20:53:54 INFO BaseHoodieWriteClient: Scheduling cleaning at instant time :20231012205353770 23/10/12 20:53:54 INFO FileSystemViewManager: Creating remote view for basePath s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer. Server=O5Fguyingc02:49803, Timeout=300 23/10/12 20:53:54 INFO FileSystemViewManager: Creating InMemory based view for basePath s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:54 INFO AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups 23/10/12 20:53:54 INFO ClusteringUtils: Found 0 files in pending clustering operations 23/10/12 20:53:54 INFO RemoteHoodieTableFileSystemView: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/view/compactions/pending/?basepath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer&lastinstantts=20231012205312331&timelinehash=dc1569f2d0a95069f69cdfdf5c8f4af7bd16faced19958dffd614411ef65996a) 23/10/12 20:53:54 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:54 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:54 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:54 INFO FileSystemViewManager: Creating InMemory based view for basePath s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:55 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20231012205312331__commit__COMPLETED]} 23/10/12 20:53:55 INFO AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups 23/10/12 20:53:55 INFO ClusteringUtils: Found 0 files in pending clustering operations 23/10/12 20:53:55 INFO RemoteHoodieTableFileSystemView: Sending request : (http://O5Fguyingc02:49803/v1/hoodie/view/logcompactions/pending/?basepath=s3a%3A%2F%2Fliangce%2Ftmp%2Fpms25%2Fods_pms25_t_psr_ds_p_transformer&lastinstantts=20231012205312331&timelinehash=dc1569f2d0a95069f69cdfdf5c8f4af7bd16faced19958dffd614411ef65996a) 23/10/12 20:53:55 INFO CleanPlanner: No earliest commit to retain. No need to scan partitions !! 23/10/12 20:53:55 INFO CleanPlanner: Nothing to clean here. It is already clean 23/10/12 20:53:55 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20231012205312331__commit__COMPLETED]} 23/10/12 20:53:55 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:55 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:55 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:55 INFO HoodieTableMetaClient: Loading Active commit timeline for s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:55 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20231012205312331__commit__COMPLETED]} 23/10/12 20:53:55 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 23/10/12 20:53:55 INFO FileSystemViewManager: Creating remote first table view 23/10/12 20:53:55 INFO BaseHoodieWriteClient: Start to archive synchronously. 23/10/12 20:53:55 INFO BlockManagerInfo: Removed broadcast_3_piece0 on O5Fguyingc02:49761 in memory (size: 36.1 KiB, free: 890.9 MiB) 23/10/12 20:53:56 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20231012205312331__commit__COMPLETED]} 23/10/12 20:53:56 INFO HoodieTimelineArchiver: No Instants to archive 23/10/12 20:53:56 INFO BaseHoodieClient: Stopping Timeline service !! 23/10/12 20:53:56 INFO EmbeddedTimelineService: Closing Timeline server 23/10/12 20:53:56 INFO TimelineService: Closing Timeline Service 23/10/12 20:53:56 INFO Javalin: Stopping Javalin ... 23/10/12 20:53:56 INFO Javalin: Javalin has stopped 23/10/12 20:53:56 INFO TimelineService: Closed Timeline Service 23/10/12 20:53:56 INFO EmbeddedTimelineService: Closed Timeline server 23/10/12 20:53:56 INFO AppendDataExec: Data source write support org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite@28b16193 committed. Map(hoodie.datasource.write.payload.class -> org.apache.hudi.common.model.OverwriteWithLatestAvroPayload, hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.NonpartitionedKeyGenerator, hoodie.datasource.write.partitionpath.field -> , hoodie.datasource.write.recordkey.field -> psr_id, hoodie.bulkinsert.shuffle.parallelism -> 1, hoodie.metadata.enable -> false, hoodie.clean.automatic -> true, hoodie.datasource.write.precombine.field -> ext_date_time, hoodie.table.name -> ods_pms25_t_psr_ds_p_transformer, hoodie.insert.shuffle.parallelism -> 1, hoodie.datasource.write.operation -> bulk_insert) 23/10/12 20:53:56 INFO DataSourceUtils: Getting table path.. 23/10/12 20:53:56 INFO TablePathUtils: Getting table path from path : s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:56 INFO DefaultSource: Obtained hudi table path: s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:56 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:56 INFO HoodieTableConfig: Loading table properties from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/.hoodie/hoodie.properties 23/10/12 20:53:56 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer 23/10/12 20:53:56 INFO DefaultSource: Is bootstrapped table => false, tableType is: COPY_ON_WRITE, queryType is: snapshot 23/10/12 20:53:56 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20231012205312331__commit__COMPLETED]} 23/10/12 20:53:57 INFO TableSchemaResolver: Reading schema from s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/5374ebf7-4120-4efc-a2fb-2a6bf8809cf2-0_3-10-0_20231012205312331.parquet 23/10/12 20:53:57 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:53:57 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:137 23/10/12 20:53:57 INFO DAGScheduler: Got job 3 (collect at HoodieSparkEngineContext.java:137) with 1 output partitions 23/10/12 20:53:57 INFO DAGScheduler: Final stage: ResultStage 4 (collect at HoodieSparkEngineContext.java:137) 23/10/12 20:53:57 INFO DAGScheduler: Parents of final stage: List() 23/10/12 20:53:57 INFO DAGScheduler: Missing parents: List() 23/10/12 20:53:57 INFO DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[20] at flatMap at HoodieSparkEngineContext.java:137), which has no missing parents 23/10/12 20:53:57 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 99.8 KiB, free 890.3 MiB) 23/10/12 20:53:57 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 36.1 KiB, free 890.3 MiB) 23/10/12 20:53:57 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on O5Fguyingc02:49761 (size: 36.1 KiB, free: 890.8 MiB) 23/10/12 20:53:57 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1433 23/10/12 20:53:57 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[20] at flatMap at HoodieSparkEngineContext.java:137) (first 15 tasks are for partitions Vector(0)) 23/10/12 20:53:57 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks resource profile 0 23/10/12 20:53:57 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 15) (O5Fguyingc02, executor driver, partition 0, PROCESS_LOCAL, 4415 bytes) taskResourceAssignments Map() 23/10/12 20:53:57 INFO Executor: Running task 0.0 in stage 4.0 (TID 15) 23/10/12 20:53:57 INFO Executor: Finished task 0.0 in stage 4.0 (TID 15). 2200 bytes result sent to driver 23/10/12 20:53:57 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 15) in 196 ms on O5Fguyingc02 (executor driver) (1/1) 23/10/12 20:53:57 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 23/10/12 20:53:57 INFO DAGScheduler: ResultStage 4 (collect at HoodieSparkEngineContext.java:137) finished in 0.208 s 23/10/12 20:53:57 INFO DAGScheduler: Job 3 is finished. Cancelling potential speculative or zombie tasks for this job 23/10/12 20:53:57 INFO TaskSchedulerImpl: Killing all running tasks in stage 4: Stage finished 23/10/12 20:53:57 INFO DAGScheduler: Job 3 finished: collect at HoodieSparkEngineContext.java:137, took 0.209424 s 23/10/12 20:53:57 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:103 23/10/12 20:53:57 INFO DAGScheduler: Got job 4 (collect at HoodieSparkEngineContext.java:103) with 8 output partitions 23/10/12 20:53:57 INFO DAGScheduler: Final stage: ResultStage 5 (collect at HoodieSparkEngineContext.java:103) 23/10/12 20:53:57 INFO DAGScheduler: Parents of final stage: List() 23/10/12 20:53:57 INFO DAGScheduler: Missing parents: List() 23/10/12 20:53:57 INFO DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[22] at map at HoodieSparkEngineContext.java:103), which has no missing parents 23/10/12 20:53:58 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 99.6 KiB, free 890.2 MiB) 23/10/12 20:53:58 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 36.0 KiB, free 890.2 MiB) 23/10/12 20:53:58 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on O5Fguyingc02:49761 (size: 36.0 KiB, free: 890.8 MiB) 23/10/12 20:53:58 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1433 23/10/12 20:53:58 INFO DAGScheduler: Submitting 8 missing tasks from ResultStage 5 (MapPartitionsRDD[22] at map at HoodieSparkEngineContext.java:103) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7)) 23/10/12 20:53:58 INFO TaskSchedulerImpl: Adding task set 5.0 with 8 tasks resource profile 0 23/10/12 20:53:58 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 16) (O5Fguyingc02, executor driver, partition 0, PROCESS_LOCAL, 4609 bytes) taskResourceAssignments Map() 23/10/12 20:53:58 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 17) (O5Fguyingc02, executor driver, partition 1, PROCESS_LOCAL, 4656 bytes) taskResourceAssignments Map() 23/10/12 20:53:58 INFO TaskSetManager: Starting task 2.0 in stage 5.0 (TID 18) (O5Fguyingc02, executor driver, partition 2, PROCESS_LOCAL, 4656 bytes) taskResourceAssignments Map() 23/10/12 20:53:58 INFO TaskSetManager: Starting task 3.0 in stage 5.0 (TID 19) (O5Fguyingc02, executor driver, partition 3, PROCESS_LOCAL, 4655 bytes) taskResourceAssignments Map() 23/10/12 20:53:58 INFO TaskSetManager: Starting task 4.0 in stage 5.0 (TID 20) (O5Fguyingc02, executor driver, partition 4, PROCESS_LOCAL, 4655 bytes) taskResourceAssignments Map() 23/10/12 20:53:58 INFO TaskSetManager: Starting task 5.0 in stage 5.0 (TID 21) (O5Fguyingc02, executor driver, partition 5, PROCESS_LOCAL, 4655 bytes) taskResourceAssignments Map() 23/10/12 20:53:58 INFO Executor: Running task 0.0 in stage 5.0 (TID 16) 23/10/12 20:53:58 INFO Executor: Running task 2.0 in stage 5.0 (TID 18) 23/10/12 20:53:58 INFO Executor: Running task 1.0 in stage 5.0 (TID 17) 23/10/12 20:53:58 INFO Executor: Running task 3.0 in stage 5.0 (TID 19) 23/10/12 20:53:58 INFO Executor: Running task 4.0 in stage 5.0 (TID 20) 23/10/12 20:53:58 INFO Executor: Running task 5.0 in stage 5.0 (TID 21) 23/10/12 20:53:58 INFO Executor: Finished task 0.0 in stage 5.0 (TID 16). 766 bytes result sent to driver 23/10/12 20:53:58 INFO TaskSetManager: Starting task 6.0 in stage 5.0 (TID 22) (O5Fguyingc02, executor driver, partition 6, PROCESS_LOCAL, 4656 bytes) taskResourceAssignments Map() 23/10/12 20:53:58 INFO Executor: Finished task 1.0 in stage 5.0 (TID 17). 806 bytes result sent to driver 23/10/12 20:53:58 INFO TaskSetManager: Starting task 7.0 in stage 5.0 (TID 23) (O5Fguyingc02, executor driver, partition 7, PROCESS_LOCAL, 4581 bytes) taskResourceAssignments Map() 23/10/12 20:53:58 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 16) in 22 ms on O5Fguyingc02 (executor driver) (1/8) 23/10/12 20:53:58 INFO Executor: Finished task 5.0 in stage 5.0 (TID 21). 806 bytes result sent to driver 23/10/12 20:53:58 INFO Executor: Finished task 2.0 in stage 5.0 (TID 18). 849 bytes result sent to driver 23/10/12 20:53:58 INFO Executor: Finished task 3.0 in stage 5.0 (TID 19). 849 bytes result sent to driver 23/10/12 20:53:58 INFO Executor: Running task 7.0 in stage 5.0 (TID 23) 23/10/12 20:53:58 INFO Executor: Running task 6.0 in stage 5.0 (TID 22) 23/10/12 20:53:58 INFO TaskSetManager: Finished task 5.0 in stage 5.0 (TID 21) in 25 ms on O5Fguyingc02 (executor driver) (2/8) 23/10/12 20:53:58 INFO Executor: Finished task 4.0 in stage 5.0 (TID 20). 806 bytes result sent to driver 23/10/12 20:53:58 INFO TaskSetManager: Finished task 2.0 in stage 5.0 (TID 18) in 27 ms on O5Fguyingc02 (executor driver) (3/8) 23/10/12 20:53:58 INFO TaskSetManager: Finished task 3.0 in stage 5.0 (TID 19) in 27 ms on O5Fguyingc02 (executor driver) (4/8) 23/10/12 20:53:58 INFO TaskSetManager: Finished task 4.0 in stage 5.0 (TID 20) in 27 ms on O5Fguyingc02 (executor driver) (5/8) 23/10/12 20:53:58 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 17) in 29 ms on O5Fguyingc02 (executor driver) (6/8) 23/10/12 20:53:58 INFO Executor: Finished task 6.0 in stage 5.0 (TID 22). 806 bytes result sent to driver 23/10/12 20:53:58 INFO TaskSetManager: Finished task 6.0 in stage 5.0 (TID 22) in 16 ms on O5Fguyingc02 (executor driver) (7/8) 23/10/12 20:53:58 INFO Executor: Finished task 7.0 in stage 5.0 (TID 23). 849 bytes result sent to driver 23/10/12 20:53:58 INFO TaskSetManager: Finished task 7.0 in stage 5.0 (TID 23) in 480 ms on O5Fguyingc02 (executor driver) (8/8) 23/10/12 20:53:58 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 23/10/12 20:53:58 INFO DAGScheduler: ResultStage 5 (collect at HoodieSparkEngineContext.java:103) finished in 0.513 s 23/10/12 20:53:58 INFO DAGScheduler: Job 4 is finished. Cancelling potential speculative or zombie tasks for this job 23/10/12 20:53:58 INFO TaskSchedulerImpl: Killing all running tasks in stage 5: Stage finished 23/10/12 20:53:58 INFO DAGScheduler: Job 4 finished: collect at HoodieSparkEngineContext.java:103, took 0.514647 s 23/10/12 20:53:58 INFO HoodieFileIndex: No partition predicates provided, listing full table (1 partitions) 23/10/12 20:53:58 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:103 23/10/12 20:53:58 INFO DAGScheduler: Got job 5 (collect at HoodieSparkEngineContext.java:103) with 1 output partitions 23/10/12 20:53:58 INFO DAGScheduler: Final stage: ResultStage 6 (collect at HoodieSparkEngineContext.java:103) 23/10/12 20:53:58 INFO DAGScheduler: Parents of final stage: List() 23/10/12 20:53:58 INFO DAGScheduler: Missing parents: List() 23/10/12 20:53:58 INFO DAGScheduler: Submitting ResultStage 6 (MapPartitionsRDD[24] at map at HoodieSparkEngineContext.java:103), which has no missing parents 23/10/12 20:53:58 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 99.5 KiB, free 890.1 MiB) 23/10/12 20:53:58 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 36.0 KiB, free 890.0 MiB) 23/10/12 20:53:58 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on O5Fguyingc02:49761 (size: 36.0 KiB, free: 890.8 MiB) 23/10/12 20:53:58 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1433 23/10/12 20:53:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 6 (MapPartitionsRDD[24] at map at HoodieSparkEngineContext.java:103) (first 15 tasks are for partitions Vector(0)) 23/10/12 20:53:58 INFO TaskSchedulerImpl: Adding task set 6.0 with 1 tasks resource profile 0 23/10/12 20:53:58 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 24) (O5Fguyingc02, executor driver, partition 0, PROCESS_LOCAL, 4388 bytes) taskResourceAssignments Map() 23/10/12 20:53:58 INFO Executor: Running task 0.0 in stage 6.0 (TID 24) 23/10/12 20:53:58 INFO Executor: Finished task 0.0 in stage 6.0 (TID 24). 2087 bytes result sent to driver 23/10/12 20:53:58 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 24) in 334 ms on O5Fguyingc02 (executor driver) (1/1) 23/10/12 20:53:58 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 23/10/12 20:53:58 INFO DAGScheduler: ResultStage 6 (collect at HoodieSparkEngineContext.java:103) finished in 0.347 s 23/10/12 20:53:58 INFO DAGScheduler: Job 5 is finished. Cancelling potential speculative or zombie tasks for this job 23/10/12 20:53:58 INFO TaskSchedulerImpl: Killing all running tasks in stage 6: Stage finished 23/10/12 20:53:58 INFO DAGScheduler: Job 5 finished: collect at HoodieSparkEngineContext.java:103, took 0.349461 s 23/10/12 20:53:58 INFO AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups 23/10/12 20:53:59 INFO ClusteringUtils: Found 0 files in pending clustering operations 23/10/12 20:53:59 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=6, NumFileGroups=6, FileGroupsCreationTime=5, StoreTimeTaken=0 23/10/12 20:53:59 INFO HoodieFileIndex: Total base files: 6; candidate files after data skipping: 6; skipping percentage 0.0 23/10/12 20:53:59 INFO FileSourceStrategy: Pushed Filters: 23/10/12 20:53:59 INFO FileSourceStrategy: Post-Scan Filters: 23/10/12 20:53:59 INFO FileSourceStrategy: Output Data Schema: struct<> 23/10/12 20:53:59 INFO CodeGenerator: Code generated in 21.016921 ms 23/10/12 20:53:59 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 377.8 KiB, free 889.7 MiB) 23/10/12 20:53:59 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 34.9 KiB, free 889.6 MiB) 23/10/12 20:53:59 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on O5Fguyingc02:49761 (size: 34.9 KiB, free: 890.7 MiB) 23/10/12 20:53:59 INFO SparkContext: Created broadcast 7 from count at MaxComputeDemo.scala:69 23/10/12 20:53:59 INFO HoodieFileIndex: No partition predicates provided, listing full table (1 partitions) 23/10/12 20:53:59 INFO HoodieFileIndex: Total base files: 6; candidate files after data skipping: 6; skipping percentage 0.0 23/10/12 20:53:59 INFO FileSourceScanExec: Planning scan with bin packing, max size: 20126262 bytes, open cost is considered as scanning 4194304 bytes. 23/10/12 20:53:59 INFO SparkContext: Starting job: count at MaxComputeDemo.scala:69 23/10/12 20:53:59 INFO DAGScheduler: Registering RDD 28 (count at MaxComputeDemo.scala:69) as input to shuffle 1 23/10/12 20:53:59 INFO DAGScheduler: Got job 6 (count at MaxComputeDemo.scala:69) with 1 output partitions 23/10/12 20:53:59 INFO DAGScheduler: Final stage: ResultStage 8 (count at MaxComputeDemo.scala:69) 23/10/12 20:53:59 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 7) 23/10/12 20:53:59 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 7) 23/10/12 20:53:59 INFO DAGScheduler: Submitting ShuffleMapStage 7 (MapPartitionsRDD[28] at count at MaxComputeDemo.scala:69), which has no missing parents 23/10/12 20:53:59 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 14.9 KiB, free 889.6 MiB) 23/10/12 20:53:59 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 7.0 KiB, free 889.6 MiB) 23/10/12 20:53:59 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on O5Fguyingc02:49761 (size: 7.0 KiB, free: 890.7 MiB) 23/10/12 20:53:59 INFO SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1433 23/10/12 20:53:59 INFO DAGScheduler: Submitting 6 missing tasks from ShuffleMapStage 7 (MapPartitionsRDD[28] at count at MaxComputeDemo.scala:69) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5)) 23/10/12 20:53:59 INFO TaskSchedulerImpl: Adding task set 7.0 with 6 tasks resource profile 0 23/10/12 20:53:59 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 25) (O5Fguyingc02, executor driver, partition 0, PROCESS_LOCAL, 4941 bytes) taskResourceAssignments Map() 23/10/12 20:53:59 INFO TaskSetManager: Starting task 1.0 in stage 7.0 (TID 26) (O5Fguyingc02, executor driver, partition 1, PROCESS_LOCAL, 4942 bytes) taskResourceAssignments Map() 23/10/12 20:53:59 INFO TaskSetManager: Starting task 2.0 in stage 7.0 (TID 27) (O5Fguyingc02, executor driver, partition 2, PROCESS_LOCAL, 4942 bytes) taskResourceAssignments Map() 23/10/12 20:53:59 INFO TaskSetManager: Starting task 3.0 in stage 7.0 (TID 28) (O5Fguyingc02, executor driver, partition 3, PROCESS_LOCAL, 4941 bytes) taskResourceAssignments Map() 23/10/12 20:53:59 INFO TaskSetManager: Starting task 4.0 in stage 7.0 (TID 29) (O5Fguyingc02, executor driver, partition 4, PROCESS_LOCAL, 4941 bytes) taskResourceAssignments Map() 23/10/12 20:53:59 INFO TaskSetManager: Starting task 5.0 in stage 7.0 (TID 30) (O5Fguyingc02, executor driver, partition 5, PROCESS_LOCAL, 4942 bytes) taskResourceAssignments Map() 23/10/12 20:53:59 INFO Executor: Running task 0.0 in stage 7.0 (TID 25) 23/10/12 20:53:59 INFO Executor: Running task 3.0 in stage 7.0 (TID 28) 23/10/12 20:53:59 INFO Executor: Running task 5.0 in stage 7.0 (TID 30) 23/10/12 20:53:59 INFO Executor: Running task 4.0 in stage 7.0 (TID 29) 23/10/12 20:53:59 INFO Executor: Running task 1.0 in stage 7.0 (TID 26) 23/10/12 20:53:59 INFO Executor: Running task 2.0 in stage 7.0 (TID 27) 23/10/12 20:53:59 INFO FileScanRDD: Reading File path: s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/5a636f9c-46ff-4fc7-9d9b-23d891f630e5-0_2-9-0_20231012205312331.parquet, range: 0-15582898, partition values: [empty row] 23/10/12 20:53:59 INFO FileScanRDD: Reading File path: s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/5abcae8c-f16f-42af-bc28-7c3d66f03673-0_0-7-0_20231012205312331.parquet, range: 0-15496706, partition values: [empty row] 23/10/12 20:53:59 INFO FileScanRDD: Reading File path: s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/9b487a1a-e587-4b10-8aa4-5936782fd77d-0_5-12-0_20231012205312331.parquet, range: 0-13680204, partition values: [empty row] 23/10/12 20:53:59 INFO FileScanRDD: Reading File path: s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/063b5bf5-19e0-4cef-a32f-2072973ed863-0_4-11-0_20231012205312331.parquet, range: 0-16123491, partition values: [empty row] 23/10/12 20:53:59 INFO FileScanRDD: Reading File path: s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/5374ebf7-4120-4efc-a2fb-2a6bf8809cf2-0_3-10-0_20231012205312331.parquet, range: 0-17243008, partition values: [empty row] 23/10/12 20:53:59 INFO FileScanRDD: Reading File path: s3a://liangce/tmp/pms25/ods_pms25_t_psr_ds_p_transformer/99900379-9000-4b45-a887-47e6e3783f4c-0_1-8-0_20231012205312331.parquet, range: 0-17465444, partition values: [empty row] 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO BlockManagerInfo: Removed broadcast_4_piece0 on O5Fguyingc02:49761 in memory (size: 36.1 KiB, free: 890.7 MiB) 23/10/12 20:54:00 INFO BlockManagerInfo: Removed broadcast_6_piece0 on O5Fguyingc02:49761 in memory (size: 36.0 KiB, free: 890.8 MiB) 23/10/12 20:54:00 INFO BlockManagerInfo: Removed broadcast_5_piece0 on O5Fguyingc02:49761 in memory (size: 36.0 KiB, free: 890.8 MiB) 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:00 INFO Executor: Finished task 1.0 in stage 7.0 (TID 26). 2060 bytes result sent to driver 23/10/12 20:54:00 INFO TaskSetManager: Finished task 1.0 in stage 7.0 (TID 26) in 844 ms on O5Fguyingc02 (executor driver) (1/6) 23/10/12 20:54:00 INFO Executor: Finished task 0.0 in stage 7.0 (TID 25). 2103 bytes result sent to driver 23/10/12 20:54:00 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 25) in 900 ms on O5Fguyingc02 (executor driver) (2/6) 23/10/12 20:54:00 INFO Executor: Finished task 2.0 in stage 7.0 (TID 27). 2060 bytes result sent to driver 23/10/12 20:54:00 INFO TaskSetManager: Finished task 2.0 in stage 7.0 (TID 27) in 955 ms on O5Fguyingc02 (executor driver) (3/6) 23/10/12 20:54:00 INFO Executor: Finished task 3.0 in stage 7.0 (TID 28). 2103 bytes result sent to driver 23/10/12 20:54:00 INFO TaskSetManager: Finished task 3.0 in stage 7.0 (TID 28) in 999 ms on O5Fguyingc02 (executor driver) (4/6) 23/10/12 20:54:01 INFO Executor: Finished task 5.0 in stage 7.0 (TID 30). 2103 bytes result sent to driver 23/10/12 20:54:01 INFO TaskSetManager: Finished task 5.0 in stage 7.0 (TID 30) in 1200 ms on O5Fguyingc02 (executor driver) (5/6) 23/10/12 20:54:01 INFO S3AInputStream: Switching to Random IO seek policy 23/10/12 20:54:01 INFO Executor: Finished task 4.0 in stage 7.0 (TID 29). 2060 bytes result sent to driver 23/10/12 20:54:01 INFO TaskSetManager: Finished task 4.0 in stage 7.0 (TID 29) in 1619 ms on O5Fguyingc02 (executor driver) (6/6) 23/10/12 20:54:01 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 23/10/12 20:54:01 INFO DAGScheduler: ShuffleMapStage 7 (count at MaxComputeDemo.scala:69) finished in 1.660 s 23/10/12 20:54:01 INFO DAGScheduler: looking for newly runnable stages 23/10/12 20:54:01 INFO DAGScheduler: running: Set() 23/10/12 20:54:01 INFO DAGScheduler: waiting: Set(ResultStage 8) 23/10/12 20:54:01 INFO DAGScheduler: failed: Set() 23/10/12 20:54:01 INFO DAGScheduler: Submitting ResultStage 8 (MapPartitionsRDD[31] at count at MaxComputeDemo.scala:69), which has no missing parents 23/10/12 20:54:01 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 10.1 KiB, free 890.0 MiB) 23/10/12 20:54:01 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 5.0 KiB, free 890.0 MiB) 23/10/12 20:54:01 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on O5Fguyingc02:49761 (size: 5.0 KiB, free: 890.8 MiB) 23/10/12 20:54:01 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:1433 23/10/12 20:54:01 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 8 (MapPartitionsRDD[31] at count at MaxComputeDemo.scala:69) (first 15 tasks are for partitions Vector(0)) 23/10/12 20:54:01 INFO TaskSchedulerImpl: Adding task set 8.0 with 1 tasks resource profile 0 23/10/12 20:54:01 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 31) (O5Fguyingc02, executor driver, partition 0, NODE_LOCAL, 4453 bytes) taskResourceAssignments Map() 23/10/12 20:54:01 INFO Executor: Running task 0.0 in stage 8.0 (TID 31) 23/10/12 20:54:01 INFO ShuffleBlockFetcherIterator: Getting 6 (360.0 B) non-empty blocks including 6 (360.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) remote blocks 23/10/12 20:54:01 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 23/10/12 20:54:01 INFO Executor: Finished task 0.0 in stage 8.0 (TID 31). 2407 bytes result sent to driver 23/10/12 20:54:01 INFO TaskSetManager: Finished task 0.0 in stage 8.0 (TID 31) in 7 ms on O5Fguyingc02 (executor driver) (1/1) 23/10/12 20:54:01 INFO TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks have all completed, from pool 23/10/12 20:54:01 INFO DAGScheduler: ResultStage 8 (count at MaxComputeDemo.scala:69) finished in 0.012 s 23/10/12 20:54:01 INFO DAGScheduler: Job 6 is finished. Cancelling potential speculative or zombie tasks for this job 23/10/12 20:54:01 INFO TaskSchedulerImpl: Killing all running tasks in stage 8: Stage finished 23/10/12 20:54:01 INFO DAGScheduler: Job 6 finished: count at MaxComputeDemo.scala:69, took 1.677341 s hudi table row num: 637678 23/10/12 20:54:01 INFO SparkContext: Invoking stop() from shutdown hook 23/10/12 20:54:01 INFO SparkUI: Stopped Spark web UI at http://O5Fguyingc02:4040 23/10/12 20:54:01 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/10/12 20:54:02 INFO MemoryStore: MemoryStore cleared 23/10/12 20:54:02 INFO BlockManager: BlockManager stopped 23/10/12 20:54:02 INFO BlockManagerMaster: BlockManagerMaster stopped 23/10/12 20:54:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/10/12 20:54:02 INFO SparkContext: Successfully stopped SparkContext 23/10/12 20:54:02 INFO ShutdownHookManager: Shutdown hook called 23/10/12 20:54:02 INFO ShutdownHookManager: Deleting directory C:\Users\Administrator\AppData\Local\Temp\spark-2a3ea347-ae14-4040-acac-b9a3c23cb4b8 23/10/12 20:54:02 INFO MetricsSystemImpl: Stopping s3a-file-system metrics system... 23/10/12 20:54:02 INFO MetricsSystemImpl: s3a-file-system metrics system stopped. 23/10/12 20:54:02 INFO MetricsSystemImpl: s3a-file-system metrics system shutdown complete. 进程已结束,退出代码0
Thanks a lot for the details. @blackcheckren . I will work on this.
@ad1happy2go The problem has been located under the tips of friends in hudi technical communication group. This problem is because the Spark timestampType data is written to the Hudi table parquet file, which will cause data errors and loss. You only need to convert the type of data to string to avoid this problem, which is indeed the case after the verification of two friends. But I still have some questions. This problem is mainly caused by bulk_Insert operation, but not in insert mode. Will these two operation types handle data writing files differently? I am not familiar with the source code, hope to get your reply, thank you.
@blackcheckren Yes, they follow different core writer path so handle them differently. Out of my curiosity, I am still worried why Spark timestamp type will cause data loss. Can you explain a bit more of this about your findings. It may be potential bug in code which we want to fix. Thanks.
@ad1happy2go I found that this problem also came from the case of a friend in the official group: he was collecting data in sqlserver, and if the datetime type was collected, there would be out-of-order and wrong lines. The users of our platform always gave feedback on data errors, and we also found data loss during the investigation. After viewing the table that found the problem basically has a datetime type field, I use SQL to convert the related field to String type, and then write, the wrong row and data loss disappeared. Tomorrow I will sort the data and write it into the table, check the data before and after the error row and data loss, and the relevant information will be sent to the following post.
Great. Thanks @blackcheckren . Let us know your findings.
@ad1happy2go Sorry for the late reply. I read the tables in Maxcompute into the memory, sort them by primary key, and write them into the Hudi table. Then I read the table from the file system and compare the data in the original table. However, I did not find any abnormality in the data level. I printed out the records that did not exist in the Hudi table but existed in the original table, and read the records with the primary key minus 1 in the original table according to the primary key, and the data performance was normal. It's confusing to me. I wonder if the number of null values in the timestamp field is the cause, because I observe that the number of null values in the above data is only 1, and there are an even number of null values above and below.
@blackcheckren Ideally these nulls should not cause data loss. Though I have not understood your explanation completely.
Is this issue reproducible. It looks to be data related only though.
hey @ad1happy2go : any follow up on this.
Describe the problem you faced Hello,when I use hudi's bulkinsert write type, data loss occurs, regardless of the mor table, cow table or any index type. The amount of data in the source table is 30 million, but when I write to hudi table, about 1000 pieces of data will be missing. And the results of multiple tests show that the amount of data lost is constant. At the same time, it does not change with the table type and index type. Most importantly, the execution logs of spark tasks do not show any error messages. This is very confusing to me, does anyone else have this problem?
Environment Description
Hudi version :0.12.2
Spark version :3.1.3
Storage (HDFS/S3/GCS..) :S3
Running on Docker? (yes/no) :K8S
Stacktrace
The execution logs of spark tasks do not show any error messages!