Closed jsadler2 closed 3 years ago
As part of this, it might be nice to be able to exclude the first n
predictions to give the model states a chance to "warm-up"
Related, there is also no mechanism currently in the code to predict past the last 365-day sequence. In other words, there is no way currently to predict a partial year.
I made a diagram to try to show how I currently have it set up to make predictions an entire time period:
Nice diagram, Jeff. I think it might be a bit easier to read if all the prediction time periods you want to keep are green rather than a mixture of blue and green. And maybe labeling the first half of the year with a 1 and second half with a 2 (e.g. A1
and A2
as opposed to A
and A1
).
And for the beginning predictions, if earlier data is not available, could you paste on the second half of the year before the first half of the year? It wouldn't be as good as having earlier data but it might be better than predicting from A
alone w/o spinup.
Yes, this is helpful. I agree with Jake's suggestions. To elaborate on one of them, I think E) is ambiguous: the text is what I'd expect, but a blue b shows up as both the first half and the second half of sequences in D), so the diagram doesn't currently communicate that these are all second halves...Jake's suggestion to make 2nd halves of each sequence be always blue would help disambiguate.
Cool idea for the beginning predictions, Jake.
Updated ☝️
Looks good! could you slightly move the second row under C) so that the time periods match up? e.g. have A2 in both rows line up.
Updated diagram:
right now the code to make predictions is a little too rigid. to make predictions I need all of the following:
[model, seq_len, start_date, end_date, x_data, x_vars, x_means, x_stds, y_means, y_stds, y_vars]
.Currently, all of these
seq_len, start_date, end_date, x_data, x_vars, x_means, x_stds, y_means, y_stds, y_vars
are bundled together and passed asio_data
to thepredict
function: https://github.com/jsadler2/river-dl/blob/242c1031b576b6f14a8d075e81f24b24593c80c2/river_dl/postproc_utils.py#L151That's okay when all those things are from the same dataset.
One use case where this doesn't work is one that I'm currently facing. What if the model was trained and I want to predict on a different dataset?
In that case, I need the
seq_len, x_vars, x_means, x_stds, y_means, y_stds, y_vars
from the data that was used to train the model, but mystart_date, end_date, x_data
are all from the data that I want to make predictions for and are independent. So I think it makes sense to make this function more flexible.