JoaquinAmatRodrigo / skforecast

Time series forecasting with machine learning models
https://skforecast.org
BSD 3-Clause "New" or "Revised" License
992 stars 113 forks source link

Weighted time series #709

Closed yuye188 closed 2 weeks ago

yuye188 commented 2 weeks ago

Hi, The function that needs to be passed to the weight_func parameter can only use the index as an input variable? If I have a dataframe with different columns, is there any way to iterate through the rows to see if there is any null and assign the weight=0 to that row?

Regards, Yu

JoaquinAmatRodrigo commented 2 weeks ago

Hi @yuye188, The current implementation only allows you to create weights based on the index. For the use case you mention, you could identify the dates with missing values using the raw series and harcode them in the weight function.

JavierEscobarOrtiz commented 2 weeks ago

Hi @yuye188,

NaNs are not allowed when working with single-series forecasters and, when using multi-series ones, they cannot appear in the target variable when creating the y training matrix (the rows where the y_matrix has NaNs are dropped).

My suggestion is to work with the index (in multi-series you can use different custom functions per series), but if you prefer to use the value of the row, what I will suggest is that you fill the NaNs with a custom value (something you can easily target eg: 100.1234567) and then create the following function:

Assuming that data is your dataframe, you can create something like this:

def custom_weights(index):
    """
    Return 0 if index is between 2012-06-01 and 2012-10-21.
    """
    weights = np.where(
                  data.loc[index]['your_col'] == 100,
                   0,
                   1
              )

    return weights
yuye188 commented 2 weeks ago

Okey, that's a good way, thank you both very much.