Open baggiponte opened 10 months ago
Agreed. This is so commonly used in industry, especially supply chain. We should make one standalone.
Update: since we have an amazing set of feature extractors, we can add a rank_by(y, extractor, order)
function that does this:
def rank_by(y: pl.LazyFrame | pl.DataFrame, extractor: str, order: Literal["worst", "best"], n_series: int):
if isinstance(y, pl.DataFrame):
y = y.lazy()
function = <getattr magic with extractor and pl.ts namespace>
results = (
y.group_by(entity)
.agg(target.ts.function.alias(extractor))
)
if oder == "best":
return results.top_k(k=n_series, by=extractor)
return results.bottom_k(k=n_series, by=extractor)
this can be used with plotting.plot_panel
to generate great EDA.
We mention and use the coefficient of variation more than once, such as here. It would be interesting to have a
evaluation.rank_cv
function to see what entities in a panel display the greatest variation.The way I see it, we should have a public method (perhaps even in feature_extraction?) to compute the CV across all entities. This would be used by
rank_cv
and possibly inplot_entities
(see #83) to display additional information about all entities in the panel.