claudiatebaldi / hectorcal

3 stars 3 forks source link

Different years / histyears for different variables in PCA #28

Closed rplzzz closed 4 years ago

rplzzz commented 5 years ago

In that case, I say we just drop [those years of] CO2 from the PCA and keep the additional model. The question is, how much of a headache is that going to be. Do we have a separate list of years to use for each, or are we using one list for both? If the latter, then I'll have to do a bit of reworking on the PCA code.

I am not sure what you are saying here, I think that it would be as simple as removing the CO2 data from the cmip_individual data frame before running the PCA. But can we still use is in the multi-model mean the consensus region for the mcmc?

I believe it's not necessary to remove the data for the years we're not using. The PCA calculation takes a vector of years to use for the historical and future years: https://github.com/kdorheim/hectorcal/blob/f7363109e19256f994e29e1152ceb92092655eb9/R/pca.R#L149 So, it will already use only the years that are mentioned in histyears or years. For purposes of computing projections, these vectors are stored in the PCA object, and once again, the projection will ignore any years it isn't supposed to be including. (Both of those functions will also throw an error if any of the years mentioned in the list are not present in the data.)

Where the problem happens is, there isn't an option to provide one histyears vector for temperature and another for CO2. So, we would either have to add such an option, or we would have to drop the last 10 years of temperature (in addition to CO2 from the historical scenario.

As for using the data in the multi-model envelope, I'm not sure, but I'm leaning toward not including those years. The problem is that with so few models, dropping one of them for a 10 year period could potentially result in a much narrower envelope for those 10 years. The Hector parameters would be constrained to fit into that envelope, and because of the serial correlation in CO2 outputs, that means we would pretty much be constraining them to fit into a reduced envelope for at least a few decades on either side. Because of that, it seems like we shouldn't include those 10 years, even in the calculations that don't use PCA.

Originally posted by @rplzzz in https://github.com/kdorheim/hectorcal/pull/26#issuecomment-530428468

rplzzz commented 5 years ago

I had forgotten about this, but I think we still have to deal with the problem of the last 10 years of historical CO2 data. The issue is twofold. We need to arrange to drop them from both the PCA, and the raw-output envelope calculations. For the raw-output mean calculations, I guess we could keep them, though I'm thinking we should maybe drop them just to be a little more consistent.

rplzzz commented 4 years ago

The latest version of the dataset doesn't seem to have this problem with the CO2 historical data, so this task is OBE.