Nixtla / hierarchicalforecast

Probabilistic Hierarchical forecasting 👑 with statistical and econometric methods.
https://nixtlaverse.nixtla.io/hierarchicalforecast
Apache License 2.0
590 stars 76 forks source link

[FEAT] Add support for exogenous variables in utils.aggregate #297

Closed KuriaMaingi closed 1 month ago

KuriaMaingi commented 1 month ago

This change to the utility function will assist in instances where you need to generate your summation and Y_df but also want to retain any exogenous vars required for your forecast.

You will need to pass in a dictionary containing your exogenous vars and the Pandas agg functions you want applied against them. The latter can either be a single string or a list of strings if you want to apply multiple aggfuncs to the same column.

Output column will have the format column_aggfunc.

Examples: aggregate(df, spec, exog_vars = {'avg_price':'mean'}) -> Will return a new column called "avg_price_mean" aggregate(df, spec, exog_vars = {'avg_price':['mean','sum']}) -> Will return 2 new columns called "avg_price_mean" & "avg_price_sum

review-notebook-app[bot] commented 1 month ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

elephaint commented 1 month ago

This change to the utility function will assist in instances where you need to generate your summation and Y_df but also want to retain any exogenous vars required for your forecast.

You will need to pass in a dictionary containing your exogenous vars and the Pandas agg functions you want applied against them. The latter can either be a single string or a list of strings if you want to apply multiple aggfuncs to the same column.

Output column will have the format column_aggfunc.

Examples: aggregate(df, spec, exog_vars = {'avg_price':'mean'}) -> Will return a new column called "avg_price_mean" aggregate(df, spec, exog_vars = {'avg_price':['mean','sum']}) -> Will return 2 new columns called "avg_price_mean" & "avg_price_sum

This is a great and valuable contribution. I'd like to first merge #296 so that we can have all the correct tests running. In the mean time, I'll have a look at the PR

KuriaMaingi commented 1 month ago

Thanks @elephaint and @christophertitchen , I'll have some time later this week and wrap this up.

On Polars/DuckDB/PySpark, this would have to be a repository level change right? Or would we want to consider modular self-contained conversions for tasks that are likely to require a fair amount of compute e.g. aggregations?

elephaint commented 1 month ago

Thanks @elephaint and @christophertitchen , I'll have some time later this week and wrap this up.

On Polars/DuckDB/PySpark, this would have to be a repository level change right? Or would we want to consider modular self-contained conversions for tasks that are likely to require a fair amount of compute e.g. aggregations?

I'd first like to move towards Polars support on a repo level, that should be relatively doable. Distributed support seems harder because of all the computations involved with Numpy arrays at the moment.

KuriaMaingi commented 1 month ago

Apologies, the commit somehow ended up being very messy merging & syncing all the changes that had happened since. All I was resolving were:

aggregate( df = df, spec = spec, exog_vars = {'exog1':'mean','exog2':'sum'} )

Let me know if this inadvertently caused other issues

elephaint commented 1 month ago

Apologies, the commit somehow ended up being very messy merging & syncing all the changes that had happened since. All I was resolving were:

  • Adding the unit test
  • Small change to enable the function signature to handle instances where the exog_var aggfunc was a single string instead of a list e.g.

aggregate( df = df, spec = spec, exog_vars = {'exog1':'mean','exog2':'sum'} )

Let me know if this inadvertently caused other issues

Thanks, I think the PR is fine but I observed some merge conflict issues in utils.ipynb that must be removed; I'll remove those myself.