Open bpbond opened 5 years ago
A little bit of thought has been given to what the "families" might be. I've gone ahead and copied the list we came up with here as a starting point (note we may well be missing some). Note that some of these are not currently in the data system at all, while others we have already identified that we should just drop (Energy technology, Aggregate transportation):
So the idea is that somewhere we'll maintain a matrix of outputs x families:
Output | Description | SSPs | NDCs | Land policy | GCAM-USA | etc. |
---|---|---|---|---|---|---|
a.xml |
Snoodle market | X | X | X | X | |
b.xml |
State gadgets | X | X | X | X | |
c.xml |
Q parameters | X | X | X | ||
etc. | X | X | X | X |
...that is used by the driver to determine what chunks to run?
@bpbond sorry, I totally meant to respond to this and forgot.
Well, honestly I'm not sure what the idea exactly the idea is. Maybe something like that could work, but:
...that is used by the driver to determine what chunks to run?
We would probably want something a little better than that. Mostly I'm thinking of something like socioeconomics_SSPN.xml
we would not really want to have separate (set of) chunks to generate each of those when if we could just swap in a couple of CSV files for the appropriate SSP up front then just have one generic (set of) chunk to process socioeconomics.xml
. On the other hand I am sure there are times when doing something for a single SSP is completely different logic than for another... Hmm.
What about adding a "configuration" object and a "preprocessor" phase. During the preprocessor phase we can do things like swap input FILEs or enable / disable chunks.
Maybe we leave the configuration object around for the MAKE phase but I could see that getting ugly fast if we have a bunch of logic about if we are building this and that then do this otherwise do that and if this other thing then yet something else. But we can try to keep that to a minimum.
Anyways, just some thoughts.
Also FYI: I pushed a quick test to use drake
on a branch called drake-test
Meeting 2019-01-31. There are a number of different issues with the "Configurations" bullet above, involving running different chunks (e.g. GCAM-USA), changing datasets (SSPs), and general complexity (electricity structure). Pralit did a test of drake
(above). Kate notes we need to tackle things incrementally–yes!
To do:
Re SSPs: The challenge with the SSPs is that (1) they touch on most aspects of the model, (2) the processing differs depending on the input, and (3) there are five of them.
With respect to (1), there are 62 chunks (~20%) in the data system that have the string "ssp" in them.
With respect to (2), for some of those chunks (e.g., population), we process the data the same, just using a different set of inputs. For others (e.g., food demand, non-co2 emissions), the processing code is different. And in some cases (e.g., electricity tech assumptions), we take variants that might be useful in other contexts (e.g., adv tech assumptions) and split the file into different parts (e.g., use adv renewable assumptions only in ssp1, discard adv assumptions for other technologies).
With respect to (3), I would expect users to either want all five or to want none.
Re electricity structure challenge: Right now we have two variations of the electricity sector: zchunk_L223.electricity.R and zchunk_L2233.electricity_water.R which ultimately produce electricity.xml
and electricity_water.xml
(with the later being a re-organization of the first splitting each technology out by associated cooling systems).
The electricity family however intersects with other families such as (among others) GCAM-USA and liquids limits (aka blend wall). The zchunk_L223.electricity_USA.R and zchunk_L270.limits.R chunks each will need to get outputs from the electricity sector to generate outputs of their own ultimately generating electricity_USA.xml
and liquids_limits.xml
. For example:
# L270.CreditInput_elec: minicam-energy-input of oil credits for electricity techs
A23.globaltech_eff %>%
fill_exp_decay_extrapolate(MODEL_YEARS) %>%
mutate(value = round(value, energy.DIGITS_EFFICIENCY)) %>%
filter(subsector == "refined liquids") %>%
mutate(minicam.energy.input = "oil-credits",
# note we are converting the efficiency to a coefficient here
coefficient = energy.OILFRACT_ELEC / value) %>%
select(-value) %>%
rename(sector.name = supplysector,
subsector.name = subsector) ->
L270.CreditInput_elec
However when the electricity water is used we need to also generate water_elec_liquids_limits.xml
:
L270.CreditInput_elec %>%
left_join(L2233.TechMap, by = c("sector.name" = "from.supplysector",
"subsector.name" = "from.subsector",
"technology" = "from.technology")) %>%
mutate(sector.name = to.supplysector,
subsector.name = to.subsector,
technology = to.technology) ->
L2233.CreditInput_elec
But of course the liquids limits family and the GCAM-USA family also intersect so we also have a liquids_limits_USA.xml
. And in principal GCAM-USA also should have permutations for the electricity water so we should have a electricity_water_USA.xml
and then also water_elec_liquids_limits_USA.xml
.
And of course in this example I ignored other families such as SSPs or NonCO2s which in principal should also have their own permutations.
I think in the most ideal scenario we should just generate one electricity.xml
where we somehow modify the tables to add / swap out tibbles to that XML based on which families are configured.
Umbrella issue summarizing discussion between @bpbond @pralitp @rplzzz @kvcalvin .
Configurations. Related: #696 #797 . Each xml output should belong to one or more "families" and users can build just one set (default GCAM, no non-CO2s, GCAM-USA, etc). This is urgent because of the upcoming GCAM-USA PR.
Shim into the driver #12 . A mechanism to modify data (either a single number, or an entire object) in a transparent and reproducible way.
Generating differential XMLs for reproducibility and transparency #1061 #756 . In fact, generating GCAM's
configuration.xml
.Consistent, integrated time settings and series. Related: #313 #1047 #872 #862 #863 #787 #455 . Should be able to change a (few) settings and generate a hindcast, or extend calibration period. Level 1 chunks should generate smooth time series and be ignorant of model periods.
Driver rework/parallelization/workflow manager. Consider something like drake #976 . Do simple parallelization test?