Open vfrank66 opened 3 years ago
I have this same issue when I apply a Filter to a DynamicFrame, have you find a workaround?
I just stopped using dynamicframes to perform what I needed to do. I noticed several things did not work as expected and then writing with dynamic frames is much slower than spark.DataFrames.
For this workaround I did a leftanti join to delete records.
current_column_order = existing_df.columns
if incr_syndicated_data.prefix_size_bytes < 10_485_760:
existing_df = existing_df.join(F.broadcast(df), "hash_value", "leftanti")
else:
existing_df = existing_df.join(df, "hash_value", "leftanti")
# retain column order which is important for partition update comparison check
existing_df = existing_df.select(current_column_order)
I am not sure where to file bugs for aws glue libraries so let me know if I am in the wrong location.
Filter.apply() does not retain the order of the columns which is a problem for me since a few columns of mine or partition columns. This seems like unexpected behavior. Ful
Errors on no partition column. The column does exist but all the fields are reordered. This seems like incorrect behavior since one Filter.apply() is not compatible with sink without addition work.