Watts-Lab / team-process-map

MIT License
0 stars 4 forks source link

Allow Users to Customize Aggregation #206

Open xehu opened 2 months ago

xehu commented 2 months ago

Currently, the system automatically "aggregates" features generated about a single chat/message to the conversation and user levels --- calculating various summary statistics for the features (mean, median, max, min, std):

https://github.com/Watts-Lab/team-process-map/blob/main/feature_engine/utils/calculate_conversation_level_features.py

However, aggregating by everything yields thousands of features --- this is way too many! Instead, we should make it possible for the user to specify what they want: for example, maybe they are only interested in the mean function (not mean, median, max, min, AND std...).

There are some design decisions here, but they are relatively simple ones; we simply need to think about how we want the user to specify which aggregations they want. Specifically, we want to think about:

  1. Which levels of aggregation does the user want? (Conversation and User are the options)
  2. Which columns (at the chat level) do they want aggregated?
  3. Which functions do they want to aggregate with (e.g., mean, std...)

Accordingly, we will want to think through the way the user should specify these desires. Here is an example:

  aggregation:
    methods: ["mean", "std"]
    columns: ["column1", "column2"]

There should also be an option to say they want no aggregations at all.

Getting Started

  1. Modify the FeatureBuilder constructor to have the user pass in parameters for whether they want conversation- and user-level aggregations at all; and if so, which aggregations they want to have (which columns, which methods).
  2. Follow the logic through in the utilities where the aggregations take place.