Closed huwf closed 7 months ago
Hey @huwf, thanks for using mlforecast.
About the first point, you specify them in the constructor so that they're computed when predicting, but you don't want to compute them in the feature engineering step, is that it?
For the second point, would providing a function with the same signature as _build_transform_name https://github.com/Nixtla/mlforecast/blob/af2173537420545570a94f95570de5c779be5614/mlforecast/core.py#L83 to the MLForecast constructor work for you?
Hi, thanks for the quick response
For the first, yes that's correct.
For the second, yes, that's exactly what I had in mind.
Possibly also the same for the straight lags too, instead of L190. Although I just thought, I guess I could do a noop function in lag_transforms to get the same effect here I guess rather than needing two naming functions?
First point makes sense, I think we can implement it.
For the second one as you can see a lag is actually just the mlforecast.core._identity
function, so you could definitely do something like: lag_transforms={1: [_identity]}
and if you've provided a function to rename them it'd work. I think we can implement that as well.
Description
Hi, two things which would be useful for me if at all possible and don't seem to be in the code that I can tell.
.fit
or.preprocess
, I'd like it so that if the column already exists for one of the lags/dates/lag_transforms it does not try to compute it. Then I can store those features at the same time as any others, and pass this data frame to .fit or .fit_models later.f"lag{lag}" or the output from
_build_transform_name, maybe adding a
Callable` somewhere perhaps? For example, the convention I have at my job is to call a column something like fieldname_lag_24, which is different to mlforecast. This is relatively easy for me to work around, but would be nice to have if it's easy to doUse case
I'm working within an existing internal pipeline which has its own conventions.
We have different methods for collecting data from different sources, and this returns a dataframe and adds/cuts off data for lags and windows at a different place than where the model is trained. This makes pickling and using a preprocessed model difficult