awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 300 forks source link

GlueContext.write_dynamic_frame.from_options #108

Closed theonlyway closed 2 years ago

theonlyway commented 2 years ago

Is it possible to specify when writing the dynamicframe out to S3 that we can pick the storage class to throw it in in S3?

glueContext.write_dynamic_frame.from_options(frame=dynamicFrame, connection_type="s3", connection_options={
            "path": s3PathLatest, "StorageClass": "STANDARD_IA"}, format="csv", format_options={"separator": ",", "writeHeader": True, "optimizePerformance": True}, transformation_ctx=f"{table['Name']}_dataSink")

There doesn't seem to be an actual documentation on what the connection_options dict actually supports and looking over the code library it doesn't actually really care what you throw in there.

moomindani commented 2 years ago

We do not have that capability for now. We filed this as a feature request. We appreciate your feedback.

Alternatively, you can use Transition transform for similar purpose. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-glue-context.html#aws-glue-api-crawler-pyspark-extensions-glue-context-transition_s3_path