Naissant / dendri

Common Healthcare feature engineering algorithms implemented in PySpark.
MIT License
2 stars 1 forks source link

condense_segments should have a retain_shape option #9

Closed rileyschack closed 3 years ago

rileyschack commented 3 years ago

Currently, condense_segments returns the original DataFrame passed in, but overwrites the start_dt_col and end_dt_col values. Seems like the function should have 2 options, similar to extend_segments:

  1. Return the original DataFrame, with 2 additional columns for start/end dates
  2. Return a new DataFrame, only containing the group_col columns and the new start/end date columns

Right now, condense_segments is somewhere in-between. If a user wants option 2, they have to re-select the grouping columns and date columns then call distinct(). If a user wants option 1, they have to join back to the original DataFrame.

Adding a retain_shape parameter would simplify user code and enable both options.