awslabs / amazon-s3-tagging-spark-util

Apache License 2.0
10 stars 2 forks source link

Files over 100mb are not getting tagged #8

Open eldar-elne opened 4 months ago

eldar-elne commented 4 months ago

Hi, I'm trying to tag my df using s3.parquet. What happens is when: output file size is < 100mb - they are getting tagged output file size is > 100mb - they are NOT getting tagged

Infra: EMR 6.14.0 Spark 3.4.1

  df \
  .repartition(600) \
  .write \
  .partitionBy(YEAR_COLUMN, MONTH_COLUMN, DAY_COLUMN) \
  .mode("overwrite") \
  .format("s3.parquet") \
  .option("tags", tags) \
  .save(path)

LMK if any extra info is needed