Hello everyone,
Using spline JAR (spark-3.3-spline-agent-bundle_2.12 JAR 2.0.0), I'm attempting to extract lineage from Glue jobs; however, this only functions with spark DataFrame and not with glue dynamic frame.
Is there any functionality in the Spline JAR or anything else that will help identify the Glue DynamicFrame's lineage?
For our UseCase, we are currently utilizing Glue 4.0.
Code:
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder.appName("CSV to DynamicFrame").getOrCreate()
# Read the CSV file into a DataFrame
df = spark.read.format("csv").option("header", "true").load("s3://test/Employee/london_emp.csv")
# Perform transformations on the DataFrame if needed
df_transformed = df.withColumn("salary", df["salary"] * 1.10)
# Create a GlueContext
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
# Convert the Spark DataFrame to a DynamicFrame
dynamic_frame = DynamicFrame.fromDF(df, glueContext, "dynamic_frame")
# Write the DynamicFrame to S3
glueContext.write_dynamic_frame.from_options(
frame=dynamic_frame,
connection_type="s3",
connection_options={"path": "s3://test/netflix"},
format="parquet"
)
Hello everyone, Using spline JAR (spark-3.3-spline-agent-bundle_2.12 JAR 2.0.0), I'm attempting to extract lineage from Glue jobs; however, this only functions with spark DataFrame and not with glue dynamic frame. Is there any functionality in the Spline JAR or anything else that will help identify the Glue DynamicFrame's lineage?
For our UseCase, we are currently utilizing Glue 4.0.
Code: