Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.75k stars 316 forks source link

Ability to select some specific windows to train / valid / predict by mask (BaseWindows) #904

Open ISPritchin opened 4 months ago

ISPritchin commented 4 months ago

Description

Thanks for a wonderful product. The more I read the source code, the more impressed I am by the quality of this development.

In our problem, the model must be able to predict only some points in the time series that satisfy the conditions. We would like to have a filtering method based on which we could determine whether a given window should be used for training / validation or not.

Use case

I saw that one of the selection mechanisms is available_mask, but its use does not completely solve our problem. Masks can be more specific. May be the library already has a solution for my problem, but I couldn't find it.

I will give an example of a task where window filtering is required. In the simplest case, let us be given vector y. But we would like to be able to make a forecast not for all ('unique_id', 'ds') points, but for those that satisfy some criteria. I mean we have to select some time periods for prediction, which are specific to every individual client. The criterion can filter out a very large number of windows from training. This criterion can be calculated in advance by the user (calculated before training and provided by the user to the model).

Suggested solution:

Example:

input_size = 3
h = 2
y       = [3, 4, 5, 6, 7, 8, 9, 10]
is_used = [0, 0, 1, 0, 1, 1, 0, 0]

Currently the following windows would be obtained:

[
    [3, 4, 5, 6, 7],
    [4, 5, 6, 7, 8],
    [5, 6, 7, 8, 9],
    [6, 7, 8, 9, 10]
]

But using the new column we would like to get: [[3, 4, 5, 6, 7], [5, 6, 7, 8, 9], [6, 7, 8, 9, 10]

The windows were selected because is_used[input_size - 1] is 1

y = [3, 4, 5, 6, 7],  is_used = [0, 0, 1, 0, 1]
y = [5, 6, 7, 8, 9],  is_used = [1, 0, 1, 1, 0] 
y = [6, 7, 8, 9, 10], is_used = [0, 1, 1, 0, 0] 

I am convinced that the implementation of this functionality will greatly increase the capabilities of the library. I will provide you with any information you need to resolve this issue.

Thanks for your hard work. I do not rule out that someday me and my team will be able to join the contributors to your project.

ISPritchin commented 4 months ago

Were you able to understand the idea described above? I, in fact, completed the implementation locally and, it seems, is ready for the pull request.

elephaint commented 3 months ago

Thanks for the suggestion, I think I understand the request. Feel free to file a PR with the request so that we can review.