functime-org / functime

Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
https://docs.functime.ai
Apache License 2.0
1.01k stars 52 forks source link

[FEAT] [evaluation] Add rank_by to evaluation #84

Open baggiponte opened 10 months ago

baggiponte commented 10 months ago

We mention and use the coefficient of variation more than once, such as here. It would be interesting to have a evaluation.rank_cv function to see what entities in a panel display the greatest variation.

The way I see it, we should have a public method (perhaps even in feature_extraction?) to compute the CV across all entities. This would be used by rank_cv and possibly in plot_entities (see #83) to display additional information about all entities in the panel.

topher-lo commented 10 months ago

Agreed. This is so commonly used in industry, especially supply chain. We should make one standalone.

baggiponte commented 8 months ago

Update: since we have an amazing set of feature extractors, we can add a rank_by(y, extractor, order) function that does this:

def rank_by(y: pl.LazyFrame | pl.DataFrame, extractor: str, order: Literal["worst", "best"], n_series: int):
    if isinstance(y, pl.DataFrame):
         y = y.lazy()

    function = <getattr magic with extractor and pl.ts namespace>

    results = (
        y.group_by(entity)
        .agg(target.ts.function.alias(extractor))
    )

    if oder == "best":
        return  results.top_k(k=n_series, by=extractor)
    return results.bottom_k(k=n_series, by=extractor)

this can be used with plotting.plot_panel to generate great EDA.