Aarhus-Psychiatry-Research / timeseriesflattener

Converting irregularly spaced time series, such as eletronic health records, into dataframes for tabular classification.
https://Aarhus-Psychiatry-Research.github.io/timeseriesflattener
MIT License
18 stars 2 forks source link

Idea: Make `ValueFrame` a sequence #548

Closed HLasse closed 3 months ago

HLasse commented 4 months ago

Back in the day, we could use the PredictorGroupSpec to make combinations of all variables, both input dataframes, lookbehinds etc. Now, we can do the same without the GroupSpec, except making the same combination of lookbehinds, fallbacks.. on a number of input dataframes.

This is useful e.g. for making features for diagnoses which could previously be done like this:

psychiatric_diagnoses = list(
    PredictorGroupSpec(
        named_dataframes=(
            NamedDataframe(df=f0_disorders(), name=f"f0_disorders"),
            NamedDataframe(df=f1_disorders(), name=f"f1_disorders"),
            NamedDataframe(df=f2_disorders(), name=f"f2_disorders"),
            NamedDataframe(df=f3_disorders(), name=f"f3_disorders"),
            NamedDataframe(df=f4_disorders(), name=f"f4_disorders"),
            NamedDataframe(df=f5_disorders(), name=f"f5_disorders"),
            NamedDataframe(df=f6_disorders(), name=f"f6_disorders"),
            NamedDataframe(df=f7_disorders(), name=f"f7_disorders"),
            NamedDataframe(df=f8_disorders(), name=f"f8_disorders"),
            NamedDataframe(df=f9_disorders(), name=f"f9_disorders"),
        ),
        lookbehind_days=[730],
        aggregation_fns=[boolean],
        fallback=[0],
    ).create_combinations()

Now, we have to make a separate PredictorSpec for each (or use the legacy group spec) which is a bit cumbersome.

Thoughts, @sarakolding, @MartinBernstorff?

sarakolding commented 4 months ago

initial thought: can only imagine that it would require an extra loop around the entire processing, which might make everything slower? but maybe my imagination is just limited

MartinBernstorff commented 4 months ago

My intuition is that this is a relatively rare use-case, and very easy to wrap.

With that said, because it is so easy to wrap, we could easily support it in the outer function – create all the different combinations, and then split them out to separate processes in the pool.

I'm up for reviewing a PR, if you think we need it 👍

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 14 days with no activity. Feel free to either 1) remove the stale label or 2) comment. If nothing happens, this will be closed in 7 days.

HLasse commented 3 months ago

Given that it's so easy to wrap I don't think it's worth the refactoring and code changes - let's bring it up again if someone besides me wants it ;)