This script, more generally, should be the FIRST step in the pipeline in which you transform the data in a meaningful way. This includes re-projecting and generating new columns like "timestamp", and the user would probably rewrite the processed data into a new folder.
Apache Sedona? Spark (SQL) for subsetting dates.
What I would expect, is to use daphmeio to handle anything related to column names, folder structure, S3, data types, and WRITING to file. So, functions should receive an optional dict as parameter that helps find alternate col_names.
This script, more generally, should be the FIRST step in the pipeline in which you transform the data in a meaningful way. This includes re-projecting and generating new columns like "timestamp", and the user would probably rewrite the processed data into a new folder.
Apache Sedona? Spark (SQL) for subsetting dates.
What I would expect, is to use daphmeio to handle anything related to column names, folder structure, S3, data types, and WRITING to file. So, functions should receive an optional dict as parameter that helps find alternate col_names.