Linear regression baseline

gabrieltseng commented 5 years ago

Adds a linear regression baseline model

tommylees112 commented 5 years ago

Oooh @gabrieltseng had a thought about flexibility of our approach.

One of the things we will want to think about testing is not only can we predict one month ahead, but can we learn the relationship between precip NOW and VHI now.

Therefore, the test is:

we know precip (the output of the seasonal models for example)
we don't know vhi
can we estimate the vhi NOW from precip now and vhi/precip for previous months.

This way we get the MOST out of the seasonal forecast and fits nicely with our planned experiments for the analysis.

This changes the training data slightly (and we need to think about how to approach this). It means that we will have 12 months of precip and only 11 months of vhi so they won't fit into the same Dataset object because they have different coordinates. Unless we have a vhi time array of all np.nans.

But makes sense to try and add in this optionality - don't know if you want to do in this branch or another branch? Or I can work on this to try and play with these preprocessing steps?

gabrieltseng commented 5 years ago

@tommylees112 , since the models only take in x and y data, they are agnostic to the time labels associated with that data.

The solution to this is probably to change the engineer so that it saves x and y data from the same time ranges - I don't think too much will need to be changed here.

gabrieltseng commented 5 years ago

Agreed, we should give some thought to more verbosity when fitting the models

ECMWFCode4Earth / ml_drought

Linear regression baseline #32