Closed liujiawen closed 1 year ago
No, Spark Agent is meant to be used as a listener. The execution plan representation from the Spark history server isn't enough. Such functionality however could be implemented as another project, something like Spline Agent for Spark History Server.
Spline Agent for Spark History Server
No, Spark Agent is meant to be used as a listener. The execution plan representation from the Spark history server isn't enough. Such functionality however could be implemented as another project, something like Spline Agent for Spark History Server.
Thank you wajda for your instructive answer!
Background
I found in my production environment, it's hard to add listener to the spark shell. Then i thought it will be ok to get execution plan of spark jobs from the history server. Is it possible to use the execution plan i downloaded instead of spark job listener? Thank you!
Feature
Add a feature to read spark execution plan file, get lineage and send data to the spline producer.
Example [Optional]
example logical plan: InsertIntoHadoopFsRelationCommand s3://example/dw/ods/ods_example_table, [dt=2023-04-18], false, [dt#62], Parquet, [field.delim=, line.delim= , serialization.format=, partitionOverwriteMode=DYNAMIC, parquet.compression=SNAPPY, mergeSchema=false], Overwrite, CatalogTable( Database: default Table: ods_example_table Owner: hdfs Created Time: Wed Apr 19 16:03:44 CST 2023 Last Access: UNKNOWN Created By: Spark 2.2 or prior Type: EXTERNAL Provider: hive Table Properties: [bucketing_version=2, parquet.compression=SNAPPY, serialization.null.format=, transient_lastDdlTime=1681891424] Location: s3://example/dw/ods/ods_example_table Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat Storage Properties: [serialization.format=, line.delim= , field.delim=] Partition Provider: Catalog Partition Columns: [
dt
] Schema: root |-- seller_id: string (nullable = true) |-- code: string (nullable = true) |-- country_code: string (nullable = true) |-- org_code: string (nullable = true) |-- name_cn: string (nullable = true) |-- name_en: string (nullable = true) |-- api: string (nullable = true) |-- create_time: string (nullable = true) |-- creater: string (nullable = true) |-- update_time: string (nullable = true) |-- operator: string (nullable = true) |-- status: string (nullable = true) |-- org_code_add: string (nullable = true) |-- user_name: string (nullable = true) |-- is_online: string (nullable = true) |-- dt: string (nullable = true) ), org.apache.spark.sql.execution.datasources.CatalogFileIndex@a03e7be2, [seller_id, code, country_code, org_code, name_cn, name_en, api, create_time, creater, update_time, operator, status, org_code_add, user_name, is_online, dt] +- Project [seller_id#47, code#48, country_code#49, org_code#50, name_cn#51, name_en#52, api#53, create_time#54, creater#55, update_time#56, operator#57, status#58, org_code_add#59, user_name#60, is_online#61, cast(2023-04-18 as string) AS dt#62] +- Project [cast(seller_id#0 as string) AS seller_id#47, cast(code#1 as string) AS code#48, cast(country_code#2 as string) AS country_code#49, cast(org_code#3 as string) AS org_code#50, cast(name_cn#4 as string) AS name_cn#51, cast(name_en#5 as string) AS name_en#52, cast(api#6 as string) AS api#53, cast(create_time#7 as string) AS create_time#54, cast(creater#8 as string) AS creater#55, cast(update_time#9 as string) AS update_time#56, cast(operator#10 as string) AS operator#57, cast(status#11 as string) AS status#58, cast(org_code_add#12 as string) AS org_code_add#59, cast(user_name#13 as string) AS user_name#60, cast(is_online#14 as string) AS is_online#61] +- Project [seller_id#0, code#1, country_code#2, org_code#3, name_cn#4, name_en#5, api#6, create_time#7, creater#8, update_time#9, operator#10, status#11, org_code_add#12, user_name#13, is_online#14] +- SubqueryAlias spark_catalog.dbinit.ods_example_table +- HiveTableRelation [dbinit
.ods_example_table
, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [seller_id#0, code#1, country_code#2, org_code#3, name_cn#4, name_en#5, api#6, create_time#7, cre..., Partition Cols: []]Proposed Solution [Optional]