Closed KuriaMaingi closed 1 month ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
This change to the utility function will assist in instances where you need to generate your summation and Y_df but also want to retain any exogenous vars required for your forecast.
You will need to pass in a dictionary containing your exogenous vars and the Pandas agg functions you want applied against them. The latter can either be a single string or a list of strings if you want to apply multiple aggfuncs to the same column.
Output column will have the format column_aggfunc.
Examples: aggregate(df, spec, exog_vars = {'avg_price':'mean'}) -> Will return a new column called "avg_price_mean" aggregate(df, spec, exog_vars = {'avg_price':['mean','sum']}) -> Will return 2 new columns called "avg_price_mean" & "avg_price_sum
This is a great and valuable contribution. I'd like to first merge #296 so that we can have all the correct tests running. In the mean time, I'll have a look at the PR
Thanks @elephaint and @christophertitchen , I'll have some time later this week and wrap this up.
On Polars/DuckDB/PySpark, this would have to be a repository level change right? Or would we want to consider modular self-contained conversions for tasks that are likely to require a fair amount of compute e.g. aggregations?
Thanks @elephaint and @christophertitchen , I'll have some time later this week and wrap this up.
On Polars/DuckDB/PySpark, this would have to be a repository level change right? Or would we want to consider modular self-contained conversions for tasks that are likely to require a fair amount of compute e.g. aggregations?
I'd first like to move towards Polars support on a repo level, that should be relatively doable. Distributed support seems harder because of all the computations involved with Numpy arrays at the moment.
Apologies, the commit somehow ended up being very messy merging & syncing all the changes that had happened since. All I was resolving were:
aggregate( df = df, spec = spec, exog_vars = {'exog1':'mean','exog2':'sum'} )
Let me know if this inadvertently caused other issues
Apologies, the commit somehow ended up being very messy merging & syncing all the changes that had happened since. All I was resolving were:
- Adding the unit test
- Small change to enable the function signature to handle instances where the exog_var aggfunc was a single string instead of a list e.g.
aggregate( df = df, spec = spec, exog_vars = {'exog1':'mean','exog2':'sum'} )
Let me know if this inadvertently caused other issues
Thanks, I think the PR is fine but I observed some merge conflict issues in utils.ipynb
that must be removed; I'll remove those myself.
This change to the utility function will assist in instances where you need to generate your summation and Y_df but also want to retain any exogenous vars required for your forecast.
You will need to pass in a dictionary containing your exogenous vars and the Pandas agg functions you want applied against them. The latter can either be a single string or a list of strings if you want to apply multiple aggfuncs to the same column.
Output column will have the format column_aggfunc.
Examples: aggregate(df, spec, exog_vars = {'avg_price':'mean'}) -> Will return a new column called "avg_price_mean" aggregate(df, spec, exog_vars = {'avg_price':['mean','sum']}) -> Will return 2 new columns called "avg_price_mean" & "avg_price_sum