Open catworlddomination opened 10 months ago
I didn't try it specifically, but from the AWS doc on the DynamicFrame
there is a chance that it would not work. The crucial thing for Spline agent is the existence of the internal Spark write event that the agent can intercept and grab the execution plan from it. That only exists in the Spark SQL API, meaning the DataFrame
. For instance RDD
lineage isn't supported because of that very reason - Spark doesn't provide any usable (for lineage purposes) logical plan on RDD operations. I don't know how exactly the DynamicFrame
is implemented (it's closed source), so it's unclear if DynamicFrame
operations eventually translate to DataFrame
ones or not. If they don't, Spline don't have ability to track them.
Try to look at the Spark driver's debug logs carefully. If Spline agent is notified on Glue write events there have to be messages. See:
When adding the Spline agent bundle to an AWS Glue Python script (Spark 3.3, Python 3), lineage is produced when using the standard patterns like
df = spark.read.csv(file_path, header=True, inferSchema=True) and df.write...
as expected.However, AWS Glue does have a concept of Dynamic Frames usage of which which looks something like
Can Spline support this dynamic frame pattern in AWS Glue? I used the spark-3.3-spline-agent-bundle_2.12-2.0.0.jar bundle - Spline agent initialized successfully, but could not produce lineage.