Need to clearly distinguish GCAM settings versus data settings

JGCRI / gcamdata

The GCAM data system

https://jgcri.github.io/gcamdata/

Other

43 stars 26 forks source link

Need to clearly distinguish GCAM settings versus data settings #480

Closed bpbond closed 2 years ago

bpbond commented 7 years ago

See e.g. #454

bpbond commented 7 years ago

In #454 @pkyle said:

Yeah I think there was a slight misunderstanding--in the old data system, I used the objects "model_base_years" and "model_future_years" to mean the years for GCAM, whereas the "historical_years" and "future_years" were just for processing data. Those were separated for a number of reasons (e.g., hindcasting runs, only running a small subset of the total available historical years, probably others too). The only reason we'd ever change the historical_years is if we got more data going further forward or back in history. I certainly never intended for people to come in and e.g. set historical_years to 1971:2006 and have that produce a similar or even functional dataset; I just never would have built it for the situation where we get less data over time, or where the max(historical_years) is anything less than 2010. Going forward, it's fine to maintain that capacity, but we will probably have to re-set a number of objects to a hard-wired 2010 year where the code currently tries to pull the latest historical year.

rplzzz commented 7 years ago

Page is exactly right. I think the sooner we fix this, the less painful it's going to be.

bpbond commented 7 years ago

Hmm. Confusing. Here is _common/assumptions/A_common_data.R:

#Historical years for data write-out
historical_years <- 1971:2010

Then aclu-data/assumptions/A_aglu_data.R:

AGLU_historical_years <- 1971:2010

And then in A_modeltime_data.R:

model_base_years <- c( 1975, 1990, 2005, 2010 )
model_future_years <- seq( 2015, 2100, 5 )

Does these things have any relation to each other?
If each module (aglu, energy, etc) defines its own historical and future time periods, what's left for historical_years? I guess the "...for data write-out" comment made me assume that it was tied to model_base_years, but not at all?

rplzzz commented 7 years ago

I'm not sure why there is a separate AGLU_historical_years. Perhaps someone writing that section wanted to leave open the possibility that the AGLU raw data might have different historical years than the rest of the data? It deals with different data sets, so it's theoretically possible.

The "historical years for data write out" comment is clearly muddled, as we now know. ¯\_(ツ)_/¯

kvcalvin commented 7 years ago

I think I'm also confused as to why switching historical years to 1971:2006 shouldn't work if the data is supposed to exist for 1971:2010. Maybe it isn't important.

pkyle commented 7 years ago

The reason for AGLU_historical_years being different is that in an earlier version of the data system, a number of the AGLU databases only went up to 2009. The "commodity balances" databases (whih used to be called Supply Utilization Accounts, which I've abbreviated SUA) tend to lag behind the PRODSTAT databases and others by 2-3 years. Right now, the AGLU historical years are the same, but in the future when we update our data, most datasets tend to lag 2 years before present, whereas the SUA data lag by 4.

pkyle commented 7 years ago

model_base_years and model_future_years are just the years that will be run in GCAM. Totally different from the historical years for which data are processed, but the model_base_years should/must consist of historical years. The reasons why setting the historical years to 1971:2006 might trip things up are that (1) the model_base_years have a year, 2010, that is a calibration year but for which there will be no historical data, which will blow things up, and (2) a number of places in the code assume that the most recent historical year(s) can be used as a proxy for calculating something, and re-setting the historical years to terminate earlier might point to a year where the necessary data aren't available.