JGCRI / gcamdata

The GCAM data system
https://jgcri.github.io/gcamdata/
Other
42 stars 26 forks source link

Missing zchunk_L2233.electricity_water.R outputs #887

Closed kdorheim closed 6 years ago

kdorheim commented 6 years ago

It looks like zchunk_L2233.electricity_water.R is missing some csv files. I want to check to see if these have been intentionally left out of the new data system or if we need to add them.

The missing csvs fall into two categories, the Logit and non Logit csv. I am pretty sure that the missing Logit csv files are not a cause for concern.

@pralitp I know we have already talked about this but could you confirm that the missing logit csv files are intentionally missing from the new data system?

missing logit csv files
L2233.Supplysector_absolute-cost-logit
L2233.Supplysector_relative-cost-logit
L2233.SubsectorLogit_absolute-cost-logit
L2233.SubsectorLogit_relative-cost-logit
L2233.SubsectorLogit_elec
L2233.Supplysector_relative-cost-logit_elec_cool
L2233.Supplysector_absolute-cost-logit_elec_cool
L2233.SubsectorLogit_relative-cost-logit_elec_cool
L2233.SubsectorLogit_absolute-cost-logit_elec_cool



@pralitp and/or @skim301 Could you indicate if the other missing csvs are intentional or unexpected?

other missing csv files
L2233.GlobalTechInterp_elec_cool
L2233.GlobalTechCapFac_elec_cool
L2233.ElecReserve
L2233.StubTechCapFactor_elec
L2233.SubsectorShrwtFllt_elec
L2233.SubsectorInterp_elec
L2233.SubsectorInterpTo_elec
L2233.SubsectorShrwt_nuc
L2233.SubsectorShrwt_renew
L2233.GlobalIntTechCapFac_elec_cool
pralitp commented 6 years ago

@kdorheim, in the first set the L2233.SubsectorLogit_elec should actually still be included. Just the ones that end in _absolute-cost-logit / _relative-cost-logit should not.

My guess about why the other ones got left out is because the chunk_generator wasn't able to pick them up as in the old data system they were processed in bulk in a for loop:

for( i in 1:length( L2233.Elec_tables_copy ) ){
    objectname <- sub( "L223.", "L2233.", names( L2233.Elec_tables_copy[i] ) )
    object <- L2233.Elec_tables_copy[[i]]
    assign( objectname, object )
    IDstringendpoint <- if( grepl( "_", objectname ) & !grepl( 'EQUIV_TABLE', objectname ) & !grepl( '-logit$', objectname ) ) {
        regexpr( "_", objectname, fixed = T ) - 1
    } else {
        nchar( objectname )
    }
    IDstring <- substr( objectname, 7, IDstringendpoint )
    write_mi_data( object, IDstring, "WATER_LEVEL2_DATA", objectname, "WATER_XML_BATCH", "batch_electricity_water.xml")
}
kdorheim commented 6 years ago

@pralitp thanks!

skim301 commented 6 years ago

It doesn't look like absolute cost logit is used for the electricity water, but Page @pkyle would know better if it is still used anywhere. Page can you add any info on this?

kdorheim commented 6 years ago

@skim301 don't worry about the first table it has already been sorted out. What about the second table?

skim301 commented 6 years ago

Yes, we need them, but maybe not all. As Pralit has mentioned, these didn't get written out.

kdorheim commented 6 years ago

@skim301 thanks! It would be helpful to know which specif ones we need to add or if we need to add all of them.

pkyle commented 6 years ago

Pretty sure we need them all! If we don't write out the "absolute-cost-logit" CSV files, then we lose the capability to assume absolute cost logit exponents in this sector. The decision between absolute and relative cost logit exponents is user-driven; we currently assume only relative cost logit functions here, so the "absolute-cost-logit" files are essentially blank (i.e., tables with header columns but no data).

kdorheim commented 6 years ago

@pkyle, @skim301, and @pralitp thanks for helping this get sorted out!

kdorheim commented 6 years ago

It turns out that most of the missing L2233 csv files were L223 files that were passed along in this chunk. So 2233.SubsectorLogit_elec, L2233.ElecReserve, L2233.StubTechCapFactor_elec, L2233.SubsectorShrwtFllt_elec, L2233.SubsectorInterp_elec, L2233.SubsectorInterpTo_elec, L2233.SubsectorShrwt_nuc, and L2233.SubsectorShrwt_renew have all been removed from the new data system. The water electricity chunk will use the L223 outputs instead.

This leaves 3 outputs that are still problematic in the new data system.

pkyle commented 6 years ago

I looked into the L2233.GlobalTechInterp_elec_cool.csv and found that it should not have been empty; it was an empty table because of an error in the R code (in the old data system), whereby the code was expecting there to have been a column called "year". Because there was none, all rows got dropped.

The net effect of this error is that starting in the first future period, combined cycle power plants (gas and liquid fuels) compete 1:1 with single-cycle power plants. Regions with lots of single cycle gas and oil power plants (e.g., the Middle East, former Soviet Union, Eastern Europe) may have large electricity price drops in 2015 as the lower-cost technology (combined cycle) will take most of the market share, and the marginal cost of electricity generation will drop.

bpbond commented 6 years ago

So should we surround this with an OLD_DATA_SYSTEM_BEHAVIOR block, and then fix it going forward? We can work around the empty tibble issue.

kdorheim commented 6 years ago

If we can work around the empty tibble issue then the OLD_DATA_SYSTEM_BEHAVIOR block could get us through the xml verification step.

pralitp commented 6 years ago

I'm still not clear about what the empty tibble issue is. Do you have # A tibble: 0 x 0 or tibble with no rows but the right columns, in this case:

sector.name,subsector.name,technology,apply.to,from.year,to.year,interpolation.function
kdorheim commented 6 years ago

@pralitp correct, the empty tibble has column names but 0 rows, so it is not saved by the data system. Then when the down stream chunk is enabled zchunk_batch_electricity_water.R the driver breaks because the chunk tries to use a csv that is not an output.

[1] "module_water_batch_electricity_water.xml"
[1] "- make 0.23"
Error in check_chunk_outputs(chunk, chunk_data, input_names, promised_outputs = po,  : 
  Some precursors for 'electricity_water.xml' aren't inputs - chunk module_water_batch_electricity_water.xml
Called from: check_chunk_outputs(chunk, chunk_data, input_names, promised_outputs = po, 
    outputs_xml = subset(chunkoutputs, name == chunk)$to_xml)

Could we use an if statement as a work around to avoid this problem? So that if the csv exists then the file is used to make the xml other wise the chunk ignores it. Or would this cause the xml file to fail the old new test? Are conditional L2 outputs something that will occur often? If so we may want a better way to handle this.

pkyle commented 6 years ago

I'd be in favor of putting an if(OLD_DATA_SYSTEM_BEHAVIOR) block around it--it is an unintentional error, and it's the sort of thing that (a) changes the results in undesirable ways, but (b) is far enough down in the weeds that nobody would notice it for a long time. In a model run I just did, I've got a 17% drop in electricity prices in Russia between 2010 and 2015, and this error is the reason why. I'd also be happy to help out here--don't mean to just volunteer other peoples' time!

kdorheim commented 6 years ago

@pkyle If you have time/are willing to add the if(OLD_DATA_SYSTEM_BEHAVIOR) block with the correction to dsr that would be great! The empty tibble problem is being handled by #905.

bpbond commented 6 years ago

Just merged #910 which fixes this.