more flexible code to make predictions

jsadler2 commented 3 years ago

right now the code to make predictions is a little too rigid. to make predictions I need all of the following: [model, seq_len, start_date, end_date, x_data, x_vars, x_means, x_stds, y_means, y_stds, y_vars].

Currently, all of these seq_len, start_date, end_date, x_data, x_vars, x_means, x_stds, y_means, y_stds, y_vars are bundled together and passed as io_data to the predict function: https://github.com/jsadler2/river-dl/blob/242c1031b576b6f14a8d075e81f24b24593c80c2/river_dl/postproc_utils.py#L151

That's okay when all those things are from the same dataset.

One use case where this doesn't work is one that I'm currently facing. What if the model was trained and I want to predict on a different dataset?

In that case, I need the seq_len, x_vars, x_means, x_stds, y_means, y_stds, y_vars from the data that was used to train the model, but my start_date, end_date, x_data are all from the data that I want to make predictions for and are independent. So I think it makes sense to make this function more flexible.

jsadler2 commented 3 years ago

As part of this, it might be nice to be able to exclude the first n predictions to give the model states a chance to "warm-up"

jsadler2 commented 3 years ago

Related, there is also no mechanism currently in the code to predict past the last 365-day sequence. In other words, there is no way currently to predict a partial year.

jsadler2 commented 3 years ago

I made a diagram to try to show how I currently have it set up to make predictions an entire time period: prediction_diagram

jzwart commented 3 years ago

Nice diagram, Jeff. I think it might be a bit easier to read if all the prediction time periods you want to keep are green rather than a mixture of blue and green. And maybe labeling the first half of the year with a 1 and second half with a 2 (e.g. A1 and A2 as opposed to A and A1).

And for the beginning predictions, if earlier data is not available, could you paste on the second half of the year before the first half of the year? It wouldn't be as good as having earlier data but it might be better than predicting from A alone w/o spinup.

aappling-usgs commented 3 years ago

Yes, this is helpful. I agree with Jake's suggestions. To elaborate on one of them, I think E) is ambiguous: the text is what I'd expect, but a blue b shows up as both the first half and the second half of sequences in D), so the diagram doesn't currently communicate that these are all second halves...Jake's suggestion to make 2nd halves of each sequence be always blue would help disambiguate.

Cool idea for the beginning predictions, Jake.

jsadler2 commented 3 years ago

prediction_diagram (2) Updated ☝️

jzwart commented 3 years ago

Looks good! could you slightly move the second row under C) so that the time periods match up? e.g. have A2 in both rows line up.

jsadler2 commented 2 years ago

Updated diagram:

Made more generic:
- "RGCN model" -> "DL model"
- "Sequence Length" instead of "365-days"
shifted bottom row of C) to line up sequences as Jake suggested

USGS-R / river-dl

more flexible code to make predictions #101