Open aabid0193 opened 1 year ago
wranglers to_parquet method. It works great for pandas and has the ability to set the mode to overwrite partitions and was wondering if we can do this with spark dataframes.
If you are using spark, i would image that simply converting your spark dataframe to a pandas one would get you there if you want to use the wrangler.
sparkDF.toPandas()
yeah that is a possibility that you can do right now, however, for large datasets that required the use of spark this wouldn't be ideal
Essentially what i'm wishing for is the ability to register Athena tables based on the Pyspark dataframe metadata. I see that this was implemented here: https://github.com/aws/aws-sdk-pandas/issues/29. However, it seems to me that this method is no longer supported in the newer versions of wrangler. Additionally would like to overwrite partitions
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.
If it isn't possible already, it would be nice i we can use spark dataframes to write to glue tables using something similar to wranglers to_parquet method. It works great for pandas and has the ability to set the mode to overwrite partitions and was wondering if we can do this with spark dataframes.