Discrepancies between input IO coefficient and the actual calculated ones

Youyi77 commented 3 months ago

Hi team,

I found some discrepancies between the input IO coefficient and the actual IO coefficients calculated by dividing input/output.

E.g. regional biomass in biomass to H2

For every scenario, if run hydrogen technology IO coefficients, the results are the same. But manually calculated region biomass (in Ej)/hydrogen produced from biomass to H2 in (EJ) are different.

Also the same situation for the electricity sector which uses efficiency.

Does that mean although the IO coefficient (or efficiency) in .xml files is fixed, the actual output in each scenario will have some variances, but still stay close to the original inputs?

Thanks so much!

patrickrorourke commented 3 months ago

Thanks for your question!

When you manually calculate the IO coefficients are you doing this by vintage or across all vintages in a model period? The reported IO coefficients in the XML for hydrogen production will be for a given vintage, so that might be where the difference is coming from.

And yes, it is correct that when calculating IO coefficients across all vintages in a period the values can vary across scenarios (as scenarios could deploy different amounts of a vintage for a technology).

Hope that helps some!

Youyi77 commented 3 months ago

Hi Patrick @patrickrorourke ,

Thank you for your quick response!

For XML ones, I actually changed a little bit on the hydrogen technology IO coefficients query. Here is what I used, which I thing might be by vintage? *[@type='sector' and contains(@name, 'H2')]/*[@type='subsector']/*[@type='technology']/*[@type='input']/IO-coefficient[@vintage=parent::*/parent::*/@year]/node()
For my own calculation, I just divided the results from queries hydrogen inputs by tech and hydrogen production by tech, which I think might be across all vintages in a model period?

*[@type='sector' and (@name='H2 central production' or @name='H2 wholesale dispensing' or @name='H2 industrial')]/
               *[@type='subsector' and not (@name='H2 delivery')]/*[@type='technology']/*[@type='output' (:collapse:)]/
               physical-output/node()

*[@type='sector' and contains(@name,'H2')]/
               *[@type='subsector']/*[@type='technology']/*[@type='input']/
               demand-physical/node()

Does by vintage mean: just for a single plant and across all vintage mean: average throughout all plants (new+old) involved in the period?

(I am having some difficulties understanding the "vintage" in the model. Even though there will be new or vintaged plants commissioned or decommissioned, the IO coefficient should always stay the same for a specific input in a specific technology?)

Thank you so much!!

patrickrorourke commented 3 months ago

No problem!

Apologies, I should have clarified what I meant by vintage.

"By vintage" refers to the values for new installations within a given model period.
And yes, that is correct for "across all vintages". This means the average from all existing facilities (new installations for a model period + installations from previous model periods that have not yet retired/shutdown).
Note that not all technologies in GCAM are vintaged (have lifetime and retirement assumptions), but the electricity generation and hydrogen production sectors are vintaged.

(1) I'll have to follow up on how to specify the query in order to retrieve IO coefficients by vintage. Note though that if the query reports values that match the hydrogen XML file's values then it is "by vintage" (if not then the results of the query are "across all vintages" within the model period).

(2) Yes, using the non-vintaged versions of those queries will report across all vintages in the model period and the values will not match the values in the hydrogen XML file.

pkyle commented 3 months ago

So it seems you've already figured this out, but when the model reports the IO-coefficient of any time period, it's only the IO-coefficient of the units installed in that time period. E.g., for the 2030 year, that's the installations between 2026 and 2030. The query is just returning the exogenous assumption that was read in to the model for the given technology and model time period. When you query the inputs and outputs of a technology, the query is by default including all units operating in the given year, irrespective of when they were installed. Note that there are some queries in the provided set that report the output by vintage. If you look at these in the model interface using the "Edit" feature, it'll look the same as the corresponding query that adds up the output of all vintages. However, if you open up your query file in a text editor, or if you copy/drag the query from the model interface out to a text file, then at the bottom of the query you'll see the string <showAttribute attribute-name="year" level="technology"/> So, you could make new queries of the hydrogen inputs and outputs that show the output by vintage, instead of adding it up, and then you'd be able to replicate exactly the reported IO-coefficients.

Youyi77 commented 3 months ago

Hi Patrick and Page, thank you both for the detailed clarifications! Really helpful for me to understand the vintage-related concepts.

I added the <showAttribute attribute-name="year" level="technology"/> for hydrogen input and production, then recalculated everything. Indeed get the reported IO coefficients!

Thanks so much!

JGCRI / gcam-core

Discrepancies between input IO coefficient and the actual calculated ones #402