Performance improvement

This is not an issue but probably a script adjustment.

Would the performance of the script improve if we would read the parquet files with Glue API: input = glueContext.create_dynamic_frame_from_options("s3", connection_options={"path": path}, format="parquet", transformation_ctx="input").toDF().withColumn("Op", lit("I"))

Instead: input = spark.read.parquet(path).withColumn("Op", lit("I"))

Thank you.

aws-samples / aws-big-data-blog-dmscdc-walkthrough

Performance improvement #2

This is not an issue but probably a script adjustment.